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Notations generates 



Les notations suivantes seront utilisees dans les differents chapitres de cette these. 

Ensembles, Nombres, Fonctions 

Card (O) : Cardinal de F ensemble fl 
\x\ : Partie entiere du reel x. 
a V b : Le maximum des reels a et b. 
a A b : Lc minimum des reels a et b. 

1a ■ Fonction indicatrice qui vaut 1 sur Pensemble A ct ailleurs. 
jO) : Derivee fc-ieme de la fonction /. 

Variables aleatoires 

Soicnt X et Y deux variables aleatoires. 
E(X) : Espcrancc mathematique dc X. 
Var(JT) : Variance de X. 
Cov(X, Y) : Covariance de X et Y. 

\\X\\ p : Normc L p (p e]0, oo[) de X definie par \\X\\ p = (E (\X\p)) 1/p , avec E(\X\p) < oo. 
Abreviations et Symboles 

:= Symbolc utilise pour la definition d'une quantitc. 
Soicnt (a„)„>i et (6„)„>i deux suites reelles. 

a n = o(b n ), n — > oo : Pour tout reel e > 0, on a \a n /b n \ < e pour n suffisamment grand. 

a n = 0{b n ), n — > oo : II existe un reel C > tel que \a n /b n \ < C pour n suffisamment grand. 

a„ x 6„, n — > oo : a n — 0(b n ) and b n — 0(a n ) pour n suffisamment grand. 



Chapitre 1 

Introduction Generale 



1.1 Presentation du sujet 

Soit (Xi, Yi), . . . , (X n , Y n ) un echantillon de variables aleatoires independentes et identi- 
quement distributes (i.i.d), de mime loi que {X, Y). On suppose que Y est une variable univariee 
a valeurs dans R, et que X designe une variable explicative multivariee prenant ses valeurs dans 
R d , d > 1. Soit m(-) l'esperance conditionncllc de Y sachant X, de telle sorte que le modele de 
regression relatif a X et Y s'ecrit 

Y i = m(X i ) + e i , i = l,...,n, (1.1) 

ou les erreurs Ei sont supposees etre des variables aleatoires i.i.d, independantes des Xi, de meme 
loi que e satisfaisant en particulicr E[e] = 0. 

Dans ce memoire de these, nous etudions l'estimation nonparametrique de la densite / de 
l'erreur du modele Cette estimation de la densite de l'erreur de regression est un impor- 

tant outil descriptif permettant de comprendre le comportcment des residus, et de faire des tests 
d'hypotheses sur la distribution des erreurs du modele ou sur la fonction de regression. On pourra 
consulter, par exemple, Ahmad et Li (1997), Dette et al. (2002), Neumeyer et al. (2005), pour le 
test de symetric de la distribution des erreurs de regression ; Akritas et Van Keilegom (2001), Cheng 
et Sun (2008), pour des tests d'ajustement sur la loi des residus; Gozalo et Linton (2001), Dette 
et von Lieres und Wilkau (2001), Neumeyer et Van Keilegom (2010), pour le test sur l'additivite 
de la fonction de regression. Notons aussi que l'estimation de / peut trouver son importance dans 
la prevision de Y n+ i a partir de X n+ i. En effet, on peut predire Y n+ \ par l'estimateur du mode 
conditionnel mod (x) de Y n+ i sachant que X n+ \ = x, puisque mod (x) = m (x) + argmax eS R /(e). 
Le fait d'estimer / est egalement tres important dans la determination d'un intervalle de prediction 
pour Y n+ i, ce qui necessite d'estimer des quantiles de la loi /. L'estimation de / peut aussi servir a 
estimer la loi de la variable Y, comme relate dans Escanciano et Jacho-Chavez (2010). Enfin cette 
estimation de la loi des residus peut etre utile pour la construction d'estimateurs nonparametriques 
de la densite et de la fonction de hazard de Y sachant X. Voir Van Keilegom et Veraverbeke (2002). 



Pour estimer la densite / des residus du modele (1.1), une premiere approche consiste a 
noter que la densite / sc dcduit de la densite ip (-|x) de Y sachant que X — x. Plus precisemment, 
on a la relation 



/(e) = if (e + m(x)\x) . 



:i.2) 
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Suivant cette idee, on peut done en thcoric deduire un cstimateur de /(e) a partir d'une estimation 
de ip {y\x) et de m(x). Cette approche est cependant sujette au "fleau de la dimension" : l'estimation 
de ip {y\x) ne peut se faire qu'avec une vitesse tres lente lorsque la dimension de x est elevee. Les 



approches proposees dans cette these visent a "deconditionner" dans l'expression (1.2 1 de /(e). En 



effet, la relation (1.2 1 entraine que 



f( e ) = / ^p(e + m(x)\x)g(x)dx, 



(1.3) 



oil g(x) designc la densite de X. Cette nouvelle formule suggere que le "fleau de la dimension" n'est 
peut etre pas aussi important que le laissait penser la premiere approche basee sur les estimations 
de / (y\x) et de m(x). Deux strategies sont mises en oeuvre dans cette these pour essayer d'eviter 
le "fleau de la dimension" . La premiere consiste a estimer nonparametriquement chaque residu £j 
par Si — Yi~m n (Xi), oil fh n (-) designe un estimateur nonparametrique de la fonction de regression 



m(-). La seconde consiste a proceder comme dans (1.3), et a etudier l'estimateur 



/„(e) = J y n (e + fh n (x)\x)g n (x)dx, 

oil (p n (-\x) et g n {x) designent respectivement des cstimatcurs nonparamctriques de tp{-\x) et g{x). 

Le probleme de l'estimation de la densite des residus d'un modele regression est un cas 
particulier d'un probleme plus general : l'estimation d'un parametre d'interet en presence d'un 
parametre de nuisance. Dans notre cadre, qui se idealise sur l'estimation de la distribution des 
residus, la densite des residus /(•) est le parametre d'interet, et la fonction de regression m(-) 
le parametre de nuisance. La presence de ce parametre de nuisance dans le modele va influcn- 
cer l'estimation du parametre d'interet. Dans le cas parametrique, considerons, par exemple, un 
echantillon Z, Z\, . . . , Z n de variables aleatoires independantes et identiquement distribuees, de 
densite f{z\6,rj), oil 6 est le parametre d'interet et r\ le parametre de nuisance. Une quantite 
centrale liee a ces deux parametres est la matrice d'information de Fischer 

I(r),6)=Vni[Vf(z\r),6)}, 

oil Vf{z\ri,9) est le gradient de f(z\ri, 9) par rapport a rj et 9 defmi par 



&f(z\v,B) 



Vf(z\t},6) 

La matrice I(rj, 9) s'ecrit sous la forme d'une matrice en blocs 



ou 



lot) = Var 



09 



leri 



I m = Var 



th] 



f(z\v,0) 



L'inegalite de Frechet-Darmois-Cramer-Rao (Borovkov 1987, page 156) montre que l'inverse de la 
matrice d'information de Fischer, J -1 (r),9), est, au sens de l'ordre sur les matrices, la plus petite 
matrice de variance possible pour les estimateurs sans biais de (r/,9). Cette borne J -1 (t],9) est 
atteinte par les estimateurs du maximum de vraisemblance, comme le rappelle le theoreme suivant. 
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Theoreme 1.1. (Borovkov 1987, page 229) 

Soit (ff n ,9 n ) un estimateur du maximum de vraisemblance de (rj,0). Sous certaines conditions de 
regularity, on a la convergence asymptotique suivante : 



in 



M{Q,r\ri,e)) 



La formule du calcul de l'inverse d'une matrice en blocs appliquee a 9) permet de voir que 

r\n,6) 



j8jj j66 



avec 



I 98 = (lee - le-ql^Irie) 



Du theoreme precedent, on deduit la loi limitc de l'estimateur du parametre d'interet 0. 
Corollaire 1.1. Sous les conditions du theoreme precedent, on a la convergence asymptotique 



La matrice I 96 s'interprete, grace a l'inegalite de Frechet-Darmois-Cramer-Rao, comme etant la 
meilleure variance possible pour un estimateur sans biais de 0, r\ etant inconnu. Puisque Ie n I~^l n $ 
est semi-positive, la formule de I 96 suggere que I 00 est, au sens de l'ordre sur les matrices symetriqucs, 
plus grande que Igg sauf si I v g = 0, condition indiquant que les estimateurs du maximum de vrai- 
semblance de 9 et tj sont asymptotiquement independants. Comme la variance asymptotique de 
l'estimateur de 9 quand 77 est connu est Igg 1 , cette difference entre I 90 et Igg 1 mesure la perte (en 
terme d'efficacite) du fait que r\ soit inconnu quand on veut estimer 9. 

Une autre situation proche du probleme de l'estimation de la densite des residus est l'estima- 
tion de la fonction de repartition lorsque des parametres sont inconnus. Considerons, par exemple, 
un echantillon Xi, . . . , X n de variables aleatoires i.i.d de fonction de repartition commune F(x, 9), 
o\i 9 e E. Pour un estimateur 9 n de 9, on definit la fonction empirique associee 

1 " 

F n (t) = -^2i{F{XiX)<t), te[o,i]. 

i=l 

Cette fonction de repartition empirique joue un role important pour les tests d'adequation du 
modele considere. En effet, F n (t) doit etre proche de t si le modele est correctement choisi. 
Considerons, par exemple, le modele de translation 

Xi = 9 + £i, i = l,...,n, 

oh. les residus £j sont de distribution commune ip. On a F(x,9) = ip(x — 9). Pour cc modele 
parametrique, on a 

F(X i ,6 n )=ip(X i -6 n ) = i;(s i ), 
oil Ei est le residu estimc Xi — 9 n . En consequence, on a 

n 1 n 

F n (t) = - (m) < = - E 1 < r 1 (*)) • 

i=i i=i 
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La relation ci-dessus montre done que F n (t) est, a une transformation de t pres, la fonction de 
repartition empirique des residus £j. Le processus empirique associe a F n est 

y n {t) = n 1 ' 2 {F n {t)-t}, te[0,l]. 
Ce processus a etc ctudie par Durbin (1973) qui obtint le resultat suivant. 
Theoreme 1.2. 5oif n un estimateur de 9 tel que 

1 71 

n 1 ' 2 (9 n -0) = —^ ™) + ^W' 

i=l 

o?H est une fonction mesurable telle que E[£(Xi, 9)] = 0. Pour tout t£ [0,1], ok definit la fonction 
g(t) par 

g(t) = g(t,9) = dF( * e d) \ x=Q{t>e) , Q(t,9) = M{z : F(z,0) = t}, 

et on pose 

rQ(tfi) 



£(x,6)dF(x,6), 

-OO 

L{9) = E[£ 2 (X 1 ,9)]. 



Alors sous des conditions de regularite, le processus {y n (t),0 < t < 1} converge asymptotiquement 
en distribution vers un processus gaussien {y(t),0 < t < 1}, de moyenne nulle et de fonction de 
covariance 

Cov(i/(ti),y(t 2 )) = min(ti,t 2 ) - *i*2 - h{h)g(t 2 ) - h{h)g{h) + g(t 1 )L(9)g(t 2 ), 



On note que cette fonction de covariance depend de la fonction de repartition F(-,0) inconnuc. 
Done la distribution asymptotique obtenue pour le processus y n (t) est differente de la loi limite 
obtenue pour le processus empirique usuel (qui suppose 9 connu), 

1 ™ 

y n (t) = n^ 2 {F n {t) - t}, F n (t) = - V 1 (F(X U 9)<t). 

n •f— f 

2=1 

En effet, il a ete demontre que le processus {y n {t),0 < t < 1} converge asymptotiquement vers un 
pont Brownicn. Voir, par exemple, le livre de Billinsgley (1968, p. 109). 

La suite de cette introduction generale donne des exemples d'estimation de parametres 
dans le cas d'un modele de regression Y = m(X) + a(X)e. Ces exemples seront donnes selon que 
le parametre de nuisance, ici la fonction de regression m(-), est parametrique ou non. 

1.2 Estimation de la fonction de repartition des residus d'un 
modele lineaire 

On considere le modele lineaire 

Y = 9 T X l +e l , i = l,...,n, (1.4) 

ou les erreurs £j sont i.i.d de fonction de repartition commune F. Les variables X, sont supposees 
non aleatoires. Soit 9 n un M-estimateur de 9 (Consulter, par exemple, Huber 1964, 1981). On 



1.2 Estimation de la fonction de repartition des residus d'un modele lineaire 



5 



s'interesse au comportcmcnt asymptotique de la fonction de repartition empirique F n des residus 
estimes Ej = Yi — Xj 9 n , 

1 ™ 

F n (t) = -Vi(£,<t), teH, 

n — ' 



i=l 



lorsque la dimension p des regresseurs pcut dcpcndrc de la taille n de rcchantillon. Ce probleme a 
ete etudie par Portnoy (1986) et Mammen (1996). Portnoy (1986) obtient le devcloppement 



*/ 2 (F n (t) F n (t)) = Wj2xJ (e n -0)+ pp(1), 



(1.5) 



i=l 



oil F n (t) est la fonction de repartition empirique basee sur les vrais residus. Puis il montre que 
ce developpement (1.5) n'a lieu que si p 2 /n = O(l) lorsque n tend vers l'infini. Mammen (1996) 
s'interesse au comportement asymptotique de F n lorsque p 2 /n est divergente. II considere un M- 
estimateur 0^ tel que 



9-Y J X l G{e l ) = O r (?-) , G(t) 



i=i 



2 \ V2 



ieR, E[G(ei)]=0, 



ou ip est une fonction derivable et croissante. Sous des conditions de regularite, Mammen montre 
que pour tout < C < oo, 



sup 

\t\<c 



i 1/2 (F n (t) - F n {t)) -A n (t)| =op(l) 
ou, si / designe la densite des residus, 



(1.6) 



f(t)p 

n l/2 



G{t) + J -—^fEG 2 (e 1 ) 



2/(t) 



Dans le resultat (1.5) de Portnoy, il n'y a pas d'influence asymptotique de l'estimation des residus 
sur l'estimateur de la distribution F(t) lorsque 

n 1 n 

X 7$n -0) = -J2 X 7Vn(e n -B) = pp(l). 



Done, puisque i/n( 



= Op(l), sous des hypotheses de regularite usuelles, la condition ci-dessus 



est realisee lorsque E[X] = 0, d'apres la loi des grands nombres. Pour le resultat ( 1.6 1 de Mammen, 
il y a un effet de l'estimation des residus. En effet, le terme A n (t) ne peut pas etre negligeable 
puisque p 2 /n diverge. 

L'estimation de la distribution des residus a aussi ete etudiee dans le cadre des modeles 
autoregressifs lineaires. Dans le autoregressif d'ordre 1 AR(1), on observe les variables aleatoires 
X , Xi, . . . , X n tclles que 

Xi = pXi-i + ei, 1 < i < n, 

oil p designe un parametre reel, et les e$ des variables aleatoires independantes et identiquement 
distribuees (i.i.d) de densite de probabilite / definie sur M. Pour estimer la fonction de repartition 
F des residus, on estime d'abord les residus Si par e i = Xi — p n Xi-i, p n pouvant etre obtenu par 
la methode des moindres carrees ordinaires. Le thcorcmc suivant obtenu par Koul (1992) donnc 
une idee sur l'effct de l'estimation des residus sur la loi limite de l'estimateur de F. 
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Theoreme 1.3. Soit p n un estimateur de p tel que n 1 / 2 (p„ — p) = Op(l). Alors sous une hypothese 
d'ergodicite de la famille {e^, 1 < i < n}, et sous d'autres hypotheses convenables, on a 



sup 



n^ 2 [F n (x,p n )-F n (x,p)] =op(1) 



Le resultat de ce theoreme montre que l'estimation du parametre p n'a pas un cffet asymptotiquc 
sur l'estimation de la fonction de repartition F des residus du modele precedent. Ceci vient de ce 



que le modele AR(1) est tres proche du modele lineaire (1.4), les variables X* etant de moyenne 
nulle. 



1.3 Estimation des moments d'une fonctionnelle de l'erreur 

La fonction de repartition correspond a un moment particulier, le moment de la fonction 
t(e < t). Miiller, Schick et Wefelmeyer (2004) ont etudie le cas plus general d'un moment Eh(e), 
mais en supposant que h est diffcrentiable. Leur cadre d'etude est le modele de regression non- 
parametrique Y = m(X) + e, ou e est independante de X. La fonction h est supposee connue. 
Le modele est base sur un echantillon d' observations i.i.d {X\, Y\), . . . , (X n ,Y n ) de meme loi que 
(X, Y). Les residus £{ sont estimes par ei = Yi — rh{Xi), oh rh est un estimateur non parametrique 
de m. Les auteurs proposent d'estimer E[/i(e)] par H n — n^ 1 Y^i=i h{£i). Sous des conditions de 
regularity, ces auteurs montrent que H n est un estimateur efhcace de E[/i(e)] tel que 



1 n 

H n =-J2 \h(ei) - E[hW(e)] Ei ] + o^n- 1 / 2 ) 
n * — ' L J 



En consequence, la quantite n 1/,2 [JJ n — E/i(e)] converge asymptotiquement vers une distribution 
normale de moyenne nulle et de variance 



7V 2 =E 



(h(e)-E/i(e)-E[/iW(e)] 



( 2 

e 1 



Un aspect surprenant de ce resultat est que, pour certaines fonctions h, la variance asymptotiquc 
r 2 de H n est plus petite que la variance asymptotique r 2 de l'estimateur H n = n~ l h(Ei) base 

sur les vrais residus. En efFet, supposons, par exemple, que les residus suivent une loi normale de 
moyenne nulle et variance egale a a 2 . Pour simplifier, on suppose que a 2 = 1. Puisque la variance 
asymptotique de l'estimateur H n est egale r 2 = E[(h(e) — Eh(e)) 2 ], on a r 2 < r 2 si et seulement si 

< E[fcW(e)] < 2E[eh(e)} ou 2E[eh(e)] < E[/i (1) (e)] < 0. (1.7) 

De plus, dans le cas ou la variable e suit une loi normale de variance a 2 — 1, on a, sous des 
hypotheses convenables, E[hW(£)] = E[e/i(e)]. En consequence, la premiere double inequalite dans 



(1.7) est verifiee si E[/i(e)e] < 0, alors que la seconde double inequalite dans (1.7| est satisfaite 
lorsque E[/i(e)e] > 0. Cette derniere condition est par exemple verifiee lorsque h(z) = z 3 , avec 
e suivant une loi normale centree reduite. Ce qui, dans un tel cas, entraine que r 2 < r 2 . Un tel 
paradoxe s'explique par le fait que l'estimateur H n utilise mieux le fait que les residus £i sont de 
moyenne nulle. 
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1.4 Estimation nonparametrique de la densite de l'erreur 
dans un modele autoregressif non lineaire 

Fu et Yang (2008) etudient la distribution asymptotique d'un estimateur a noyau de la 
densite de l'erreur dans un modele AR(p) non lineaire. Ce modele est de la forme 

Xi = 9e(Xi-i, ■ ■ ■ ,Xi_ p ) + £j, i > 1, 

ou {Xi,i £ Z} est strictement stationnaire, et 9 = (9\, . . . ,9 q ) T € R 9 - Les sont i.i.d, de densite 
/, avec une moyenne nulle et une variance a 2 . On suppose egalement que les residus £j sont 
independantes de la famille (Xi-i, . . . , Xj_ p ). Pour un estimateur 8 — (9\, . . . , 9 q ) T , on estime les 
residus et par 

£i = Xi — g^(Xi-i, . . . , Xi-p), i > 1. 
En utilisant ces residus empiriques, Fu et Yang estiment nonparametriquement la densite / par 



/in 

% J. 

ou (/i ra ) est une suite de reels positifs tendant vers zero quand n tend vers l'infini, et K une fonction 
noyau definie sur R. En designant par 



i=l 



l'estimateur nonparametrique de / base sur les vrais residus, Fu et Yang obtiennent le resultat 
suivant. 

Theoreme 1.4. Fu et Yang (2008) 

Supposons qu'il existe un reel C\ > tel que l'estimateur 9 verifie, avec une probabilite egale a 1, 



limsup./— \\0-e\\<C t , (1.8) 

rn-oo y log log n 

ou \\x\\ 2 — Ylj=i x "j pour tout x = (x\, . . . ,x q ) T G M 9 . On suppose egalement que la fenetre h n 
satisfait 

h n -> 0, lim - — = oo. (1.9) 

n->oo log log n 

Alors sous certaines conditions de regularite, on a la convergence en distribution suivante : 

/, , = , M (/«(*) - E/„(t)) A AA(0, 1) , 
VVar/ n (t) V / 

oii A/"(0, 1) designe la loi normale centree reduite. 



La condition (1.8| est satisfaite par un estimateur du maximum de vraisemblance sous certaines 
conditions proposees par Klimko et Nelson (1978). 

II a ete demontre dans la litterature statistique que n -1 / 5 est l'ordre de la fenetre optimale 
pour l'estimation nonparametrique de la densite d'une variable aleatoire reelle £ a partir d'un 
echantillon de variables aleatoires i.i.d Cij C2 ; ■ ■ ■ ; Cn- Pour ce resultat, on peut, par exemple, se 
referer aux ouvrages de Bosq et Lecoutre (1987), Scott (1992), Wand et Jones (1995). On note que 
dans le cadre du theoreme precedent, la condition (1.9) ne peut pas verifiee lorsque h n est d'ordre 
n -1 / 5 , mais que tous les ordres n~^ 1 /^ +<L , e > 0, qui s'en approchent sont possibles. 
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1.5 Estimation de la loi des residus en regression nonpa- 
rametrique 

L'ctudc dc P estimation nonparametrique d'une distribution de l'erreur dans un modele de 
regression nonparametrique occupe une place importante dans la litterature statistique. En effct, 
plusieurs resultats inherents a ce type d'estimation ont ete obtenus au debut de cette decennie. 
On peut citer, par exemple, Akritas et Van Keilegom (2001) dans le cadre de l'estimation non 
parametrique de la fonction de repartition de l'erreur d'un modele de regression heteroscedastique, 
puis Efromovich (2005, 2007) et Cheng (2005) pour l'estimation nonparametrique de la densite des 
residus d'un modele de regression homoscedastique. Plus recemment, Wang, Brown, Cai et Levine 
(2008) se sont interesses a l'etude de Pinfluence de la fonction moyenne conditionnelle, supposee 
inconnue, sur l'estimation de la variance conditionnelle des residus dans le cas d'un modele de 
regression heteroscedastique. 



1.5.1 Estimation de la fonction de repartition des residus dans un modele 
de regression heteroscedastique 

Akritas et Van Keilegom (2001) proposent un estimateur nonparametrique de la fonc- 
tion de repartition F de l'erreur e dans le modele de regression heteroscedastique Y = m{X) + 
a(X)e, ou e est independante de X, et m et a des fonctions "lisses"satisfaisant quelques condi- 
tions de regularite. L'cstimateur F n de F e est base sur l'estimation nonparametrique des residus 
Si = (Yi — m(Xi))/a(Xi), oil (X, Y), (X\, Y\), . . . , (X n , Y n ) designent un echantillon d'observa- 
tions independantes et identiquement distribuees. Pour l'estimation de ces residus, Akritas et Van 
Keilegom ecrivent m(x) sous la forme 

m(x) = { F^is^ds, (1.10) 



oil F 1 {s\x) — inf{y £ R : F(y\x) > s}, F(y\x) — F(Y < y\x). On note que si la fonction F est 
continue, le changement de variable s = F(u\x) dans (1.10) entraine 

/ F- 1 {s\x)ds= ( udF(u\x) = E [Y\X — x] — m(x). 
Jo Jr 

Pour l'estimation de F E , les auteurs estiment dans un premier temps F(y\x) par l'estimateur de 
Stone (1977) 

n 

F(y\x) = J2W i (x,a n )l(Y i <y), 

i=l 

ou les Wi(x,a n ) sont les poids de Nadaraya- Watson (1964) definis par 



Wi(x,a„) 



r; i^ ( A „ ') ' 



avec K designant une fonction noyau, et a n une fenetre tendant vers lorsque n tend vers l'infini. 
Dans un deuxieme temps, Akritas et Van Keilegom estiment m(x) et <7 2 {x) par 

m(x)= [ F^is^ds, d 2 (x) = [ F^islxfds-ff^ix). 
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II convient de signaler a nouveau que le changement de variable s — F n {y\x) entraine 

/ F- 1 (s\x)ds = J2Y i W i (x ) a n ), 
ce qui correspond a l'estimateur de Nadaraya- Watson (1964) classique. 

Avec l'aide de ces estimateurs de m(x) et a(x), on estime chaque residu £$ par ej = (Y^ — 
m(Xi))/a(Xi). L'estimateur de F e (t) base sur les residus estimes est alors defini par F e (t) = 
n^ 1 2™=i 1 {^i — Pour la determination de la loi limite de cet estimateur, Akritas et Van Kei- 
legom proposent d'abord un developpement asymptotique de F e (t). Ce developpement est donne 
par le theoreme suivant. 

Theoreme 1.5. On suppose que la fonction de repartition Fx de X est trois fois derivable sur 
le support X de X, et que et la densite fx de X verifie inf xe x fx( x ) > 0. On suppose egalement 
que les fonctions m(-) et er(-) sont deux fois continument derivables sur X et que \nt x( zx o~(x) > 0. 
Alors pour tout fel, on a 



<p(x,y,t) = / [l(y<v)-F(v\x)} 



o(x) 



v — mix) 
l+t T \l 



dv, 



2 r £\2 

Pn(t) = / ^E[ip(x,Y,t)\u) \ x = u dF x (u), 



avec f e designant la densite de e, [Ik une constante qui depend de K , et Fx la fonction de 
repartition de X. 

De ce theoreme, Akritas et Van Keilegom deduisent le corollaire suivant qui donne un resultat 
de convergence asymptotique du processus n 1 ^ 2 (F e (t) — F e (t)). Ce resultat etend les travaux de 
Durbin (1973) et Loynes (1980) concernant la loi asymptotique d'un estimateur de la fonction de 
repartition des residus base sur des parametres estimes. 



Corollaire 1.2. Supposons que le Theoreme \ 1 . 5\ est verifie. 

(i) Si na^ — ¥ 0, alors le processus n 1 ^ 2 (F £ (t) — F £ (t)), t € K, converge en distribution vers un 
processus gaussien Z(t) de moyenne 

EZ(t) = E [1 (e < t) - F e {t) + ip(X, Y, t)] = 0, 

et de fonction covariance 



Cov(Z(ti),Z(t 2 ))=E 



i( £ <ti)-F e (t 1 ) + ^(x,y,t 1 ) 



t(e<t 2 )-F £ (t 2 ) + ^(X,Y,t 2 ) 



(ii) Si a n = Cn 1 ^ 4 , avec C > 0, alors le processus n 1 ^ 2 (F e (t) — F e (t)), i £ 1, converge en 
distribution vers un processus gaussien Z(t) de moyenne 

s~i2 r q2 

^Z(t) = — |* J —E[ip(x,Y,t)\u] \ x=u dF x (u), 
et de meme fonction de covariance que le processus Z(t). 
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Le premier point du corrolaire precedent montre que si na\ tend vers 0, alors pour tout t £ K, 

n^ 2 (F £ {t) - F £ {t)) ^ Af (Q,VarZ(t)) . (1.11) 
De plus, puisque E [<£>(V, Y, t)] — 0, un simple calcul montre que 



VarZ(t) 



IE 



l(e<t)-F £ (t) + cp(X,Y,t) 



F £ (t) (1 - F E (t))+E[p 2 (X,Y,t) + 21 (e < t)<p{X,Y,t)) 



(1.12) 



Mais par le Theoreme Central Limite, l'estimateur F n (t) 
les vrais residus satisfait 



_1 E£=i 1 ( £ * ^ *) de F S) ba se sur 



n x /\F n (t) - F £ (t)) JV(0, F e (f) (1 - F e (t))) 



Ce resultat, ( 1.11 ) et ( 1.12 ) montrent que la variance asymptotique obtenue avec l'estimateur F e (t) 
est inferieure a la variance asymptotique F £ {t) (1 — F £ {t)) obtenue avec F n {t) lorsquc 

E [<p 2 (X, Y, t) + 21 (e < t) tp(X, Y, t)] < 0. 

Dans ce cadre, il ya done un impact positif cause par l'estimation des residus sur la loi limite de 
l'estimateur de F £ (t). Notons que ces resultats ne traitent pas le cas ou l'ordre de a n est n -1 / 5 , 
l'ordre optimal de la fenetre pour l'estimation de ro(-). 

Dans un article plus recent, Neumeyer et Van Keilegom (2010) ont etabli des resultats compa- 
rables a ceux obtenus par Akritas et Van Keilegom (2001) dans le cas du modele de regression 
heteroscedastique multiple : Y = m(X) + <j(X)e, X G M. d , d > 1. 



1.5.2 Estimation adaptative de la densite des residus 

Efromovich (2005, 2007) utilise unc methode adaptative pour estimer la densite f £ de 
l'erreur dans le cas des modeles de regression homoscedastique et heteroscedastique. La methode 
est adaptative par rapport a la regularite de f £ , mesuree par son ordre a de derivabilite. Un 
estimateur est alors dit adaptatif s'il ne depend pas de a mais converge vers f £ avec la meme 
vitesse que les estimateurs optimaux construits en connaissant a et bases sur les vrais residus. 

Les modeles consideres sont de la forme Y = m(X) + e pour le modele de regression ho- 
moscedastique, ou de la forme Y — m(X) + a(X)(,, pour le modele de regression heteroscedastique. 
Ces modeles sont bases sur un echantillon d'observations i.i.d (Xi, Yi), . . . , (X n , Y n ) de meme loi 
que (X, Y). Les variables £ et e sont supposees centrees et independantes de X. Les fonction 
m(-) et er(-) sont inconnues et definies sur [0, 1]. L'etude d'un estimateur de la densite de l'erreur 
par Efromovich s'est faite suivant la nature du support de l'erreur. On distinguera le cas ou le 
terme d'erreur est a support borne [—1, 1], et le cas ou le terme residuel est de support non borne 
(—00,00). Mais dans cette sous-section, on ne parlera que du dernier cas. Pour le premier cas, le 
lecteur pourra se referer au papier d'Efromovich (2005). 

Dans le cas oil le terme d'erreur est de support (—00,00), l'etude se fait done avec le modele de 
regression homoscedastique Y = m(X) +e, ou la fonction de regression m est supposee inconnue et 
defmie dans [0, 1]. Pour estimer la densite f £ de l'erreur e, Efromovich utilise un estimateur base sur 
un developpement en serie de cosinus. L'estimation de f s necessite une subdivision des observations 
en trois sous-echantillons. Le premier sous-echantillon de taille ri\ est utilise pour estimer la densite 
marginale p de X. La deuxieme partie de l'echantillon (de taille n\) est reservee a l'estimation de 
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la fonction de regression to, alors que le dernier sous-echantillon (de taille n% = n — 2ni) est reserve 
a l'estimation de la densite f £ . On pose, pour tout u £ [0, 1], 

ip (u) = 1, tpj(u) = v / 2cos(7rju), j > 0. 

Les estimateurs de p et to sont alors definis par, pour x £ [0, 1], 

/ m S \ 

p(aj) = max I b" 1 ,^ 1 ^^^*^)^^) J > 

\ 1=1 s=0 ) 

«w - -r' £ E y<y -ggr- (x> - <" 3 > 

ou 6„ = 4 + lnln(n + 20), n-i = ri\(ri) designe le plus petit entier superieur ou egal a n/b n , et 
S = S n represente le plus petit entier superieur ou egal a n 1 / 3 . 

Avec l'aide de ces estimateurs de p et to, Efromovich estime les residus eg, £ — 2n\ + 1, . . . , n par 

e t = Y t - fh(Xi), e = 2rn + l,...,n. 

Pour t € K, l'estimateur f e de f e (t) est alors defini, suivant la methode d'estimation de Pinsker 
(1980), par 

k n n 

fe(t) = h = (n- 2ni)- x ^ <Pi > 

i=o «=2m+i 

ou fc n est le plus petit entier superieur ou egal a n 1 / 5 &„, et les flj sont les estimateurs des coefficients 
de Fourier 9j = J Q f e (u)(fj(u)du. Ces coefficients sont estimes selon la procedure suivante. On 
subdivise l'ensemble N des entiers naturels en des blocs non imbriques Bf., k = 1, 2, . . . et on pose 
tfc = l/ln(fc + 2). Les jEtj sont alors definis par 

= k ^^ k ° l ^~ 1 (k- 2 E % > (1 + **)"- 1> ) , i^B, (1.14) 
k l^seB k °s \ s eB k / 

Pour evaluer la performance de l'estimateur f e (t), Efromovich considere l'estimateur f £ (t) de f e 
base sur les vrais residus. Cet estimateur est defini par 

k n n 

7 E (*)=X>^(*), e j = (n-2n 1 )- 1 £ ^(e,), 

i=0 £=2 ni + l 



ou les coefficients sont definis comme dans (1.141 en remplagant seulement les 9j par les pseudos- 
estimateurs 0j des coefficients 9j. En definissant l'erreur quadratiquc moyenne integree 

MISE(/ e , f e ) = E [ (f e (t) - f £ (t)) 2 dt, 
Jo 

Efromovich obtient le resultat suivant. 
Theoreme 1.6. Efromovich (2005) 

On suppose que les fonctions p et m sont de classe C 1 sur [0, 1] . Alors sous certaines conditions 
de regularite, on a 



MISE(/ £ , f e ) < ( 1 + — ) MISE(/ e , U 



(1 \ Ch 3 

— MISE(/ e ,/ E ) + ^ 
In o n J n 



oil C est une constante strictement positive. 
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Dans un article plus recent, Efromovich (2007) montre que le resultat du theoreme precedent rcstc 
valable sans une procedure de "splitting" (subdivision) des donnees dc l'echantillon. 

Dans le cas ou la densite f £ admet une derivee generalisee d'ordre a > 2, Efromovich 
montre que l'estimateur f E base sur les vrais residus atteint la vitesse de convergence minimax 



n -2a/(2a+i) p 0ur \ e r i S q ue quadratique moyen integre. Done le Theoreme 1.6 prouve qu'il n'y a pas 
de perte (au sens de la vitesse minimax) du fait de ne pas observer les residus. En consequence, 
puisque f e est adaptatif par rapport a la regularite de f £ , il en est de meme pour l'estimateur f £ . 

Dans un article recent, Plancade (2008) presente un estimateur nonparametrique de la 
densite de l'erreur dans un modele de regression homoscedastique, base sur des techniques de 
selection de modele. Avec cette methode, Plancade propose une majoration du risque quadratique 
integre, et obtient la meme vitesse minimax que celle obtenue par Efromovich (2005). 

1.5.3 Estimation de la fonction variance en regression heteroscedastique 

Dans cette sous-section, nous donnons un exemple sur 1'influence de l'estimation la fonction 
moyenne m(-) sur l'estimation de la fonction variance V(-) dans le cas du modele de regression 
heteroscedastique 

Yi = m(xi) + V l ' 2 ( Xi )ei, i = l,...,n, (1.15) 

ou Xi = i/n, et les £j sont des variables aleatoires i.i.d, centrees, de variance egale a 1, et admettant 
des moments d'ordre 4 finis. Dans ce modele, le parametre d'interet est la fonction V, et on 
s'interesse a l'etude de l'impact de m sur l'estimation de V. La qualite de cette estimation est 
fortement dependante de la regularite de la fonction de regression m. On souhaite evaluer l'impact 
de l'estimation de m sur un estimateur de V . Ce probleme a ete etudie par Wang, Brown, Cai 
et Levine (2008). Ces auteurs ont montre qu'il est possible d'evaluer explicitement l'impact de m 
sur l'estimateur de V. Cet impact se mesure a l'aide des erreurs quadratiques moyennes globale et 
locale defimes par 

R n —E [ {V n (x) ~ V{x)f dx, R n (x) = E (V n (x) - V(x)f . 



JO 

Ici V n (x) designe un estimateur nonparametrique de V(x). L'estimateur considere par Wang et al. 
(2008) est defini comme suit. On considere d'abord un noyau K a support dans [—1, 1]. Ensuite, 
pour i = 2, . . . , n — 2, on pose a, = [Xi + a;j_i) /2 et hi = (Xi + Xi+i) /2. Enfin, pour i = 2, . . . , n — 2, 

< h < 1/2 et x e [0, 1], on defrnit 

et on prend cette integrale de a (xi + x^) /2 pour i = 1, et de (x n -i + x n -2) /2 a 1 pour 

1 = n — 1. Sous certaines hypotheses sur le noyaux K, on peut verifier que pour tout x G [0, 1], 

Ki( x ) = 1- L'estimateur V n (x) de V(x) est alors defini par 

n— 1 

V n (x)^-Y / K';(x)(Y l -Y l+1 ) 2 . (1.16) 
i=i 

Pour a > et M > 0, considerons la classe de fonctions M-lipschitziennes 

C a (M) = [g:Vx,ye [0, 1], V fe = 0, . . . , L«J - 1, |<? (fe) | < M, \g^(x) - g^\y)\ < M\x - y\ a '] , 
ou \_a\ est le plus grand entier naturel inferieur a a, et a' — a — [aj . On a alors le resultat suivant. 
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Theoreme 1.7. Wang, Brown, Cai et Levine (2008) 



On considere le modele de regression {1.15), ou Xi — i/n, et les Ei sont des variables aleatoires 
i.i.d, centrees, de variance egale a 1, et admettant des moments d'ordre 4 finis. On suppose qu'il 
existe des constantes strictement positives a, f3, M\ et Mi telles que m £ £ a (Mi) et V £ £' 3 (M 2 ). 
Alors sous des hypotheses convenables, la fenetre optimale h n pour I'estimateur V n {x) de V(x) 
est de I'ordre de n" 1 /^ 2 ^) . De plus, pour un tel choix optimal de h n , la vitesse de convergence 
mimimax pour les quantites R n et R n (x) est de I'ordre de max{n _4a , n -2 ^ ' 2 ^ +1 ' } . 

A l'aide de ce theoreme, on peut comparer la performance (en terme de vitesse minimax) de 
I'estimateur V n (x) a celle de I'estimateur V n (x) base sur l'estimation de m par rh n . Cet estimateur 
V n (x) est de la forme 



V n (x) = ^^w i (x)(Y i -rh n (x l )) 2 , (1-17) 



ou les Wi(x) sont des fonctions poids. On note qu'avec I'estimateur V n (x), la vitesse de convergence 
minimax max{ri _4a , n _2,9 /( 2 ^ +1 )} ne peut etre obtenue que si la fonction moyenne m est estimee 
par un estimateur de rh n faiblement biaise. C'est ce qui a incite Brown, Cai et Levine (2008) a 



prendre un estimateur fh n de m tel que fh n {xi) = Yj.fi. Ce qui, reporte dans (1.17), conduit a un 



estimateur du type (1.16). Un tel estimateur a une variance assez clcvec ct un biais suffisamment 
petit, pour n suffisamment grand. Mais les auteurs ont prouve qu'une grande variance de fh n ne 
peut pas affecter la vitesse de convergence de V n . Done fmalement, pour l'estimation de la fonction 
V, un estimateur optimal fh n est celui de biais minimum, et non necessairement celui d'erreur 
quadratique mimimale. Un enseignement important est que le carre du biais de fh n joue un role 
plus important que sa variance. En consequence, utiliser un estimateur qui serait optimal pour 
l'estimation de m n'est pas interessant ici, car un tel estimateur egalise asymptotiqucmcnt le carre 
du biais et la variance. 

1.5.4 Estimation de la densite des residus basee sur un estimateur de 
Nadaraya- Watson de la fonction de regression 

Le probleme de l'estimation nonparametrique de la densite / des residus a ete considere 
par Cheng (2005) dans le cadre du modele de regression nonparametrique Y = m(X) +e. Dans ce 
modele, la fonction de regression m est definie sur [0, 1], et les estimateurs proposes se construisent 
en utilisant les observations {X\, Y\), . . . , (X n , Y n ). Ces observations sont scindees en deux parties. 
La premiere partie est destinee a l'estimation des residus £j = Yj, — m(Xi), tandis que la seconde 
partie des observations est reservee a la construction de I'estimateur de /. Les estimateurs £j des 
residus Ej s'obtiennent a partir des estimations des quantites m(Xj). Pour ce faire, Cheng considere 
un entier r n dependant de n, et satisfaisant 

< r n < n/2, lim r n = oo, lim (n — r n ) = oo. 

n— J-oo n— +oo 

II utilise les r n premieres observations (X\ , Y\ ) , . . . , (X rn , Y Tn ) pour construire I'estimateur de 
la fonction m(x). Cet estimateur de m(x) est celui de Nadaraya- Watson base sur les donnees 
(Xi,Yi), . . . , (X rn ,Y rn ) : 



m n (x) = j- — , x€[0, 1, 



1.5 Estimation de la loi des residus en regression nonparametrique 



14 



ou h n est une fenetre strictement positive tendant vers quand n tend vers l'infini, et K une 
fonction integrable sur E et d'integrale 1. 

Le reste des observations (X rn+ i 1 Y rn+ i), . . . , (X ni Y n ) est utilise pour estimer les residus Si par 

Ei = Yi - m n (Xi), r n + 1 < i < n. 
L'estimateur nonparametrique de la densite des residus construit par Cheng est alors defini par 

1 " 

/n(*) = 57 r~ J2 ^(t~a n <e t <t + a n ), teR. 

2(n - r n )a n .j^ 

Avec cet estimateur, Cheng (2005) obtient le resultat suivant. 

Theoreme 1.8. Soit t e [0, 1] tel que f(t) > 0. Supposons que < r n < n/2 tel que 

vi \ 3 n y m ( \ v {n - r n )a n \ogr n 

lim (n — r n Ja„ = 0, hm [n — r n )a n = oo, km = 0. (1-1°) 

On suppose egalement que la densite g des Xi est localement lipchitzienne sur [0, 1] . Alors sous 
d'autres hypotheses de regularite, on a la convergence en distribution suivante : 

>/2(»-r m K ^" ( ^ (f) ^ A AT (0.1), 

ou iV(0, 1) designe la loi normale centree reduite. 

II a ete demontre dans la litterature statistique que n -2 / 5 est la vitesse optimale de convergence 
obtenue avec l'estimation nonparametrique de la densite d'une variable aleatoire reelle £ a partir 
d'un echantillon de variables aleatoires i.i.d Ci , C2 , ■ ■ ■ Xn- Pour ce resultat, on peut, par exemple, 
se referer aux ouvrages de Bosq et Lecoutre (1987), Scott (1992), Wand et Jones (1995). Mais pour 
< "r n < n/2, le resultat du theoreme precedent montre que la vitesse n~ 2 ' 5 pour l'estimateur 
fn(t) ne peut-etre atteinte que si la fenetre a n est d'ordre n -1 / 5 . Mais pour un tel ordre, la premiere 
condition dans ( 1.18) ne peut pas etre satisfaite. Done sous les conditions du theoreme precedent, 
l'estimateur f n (t) ne peut pas atteindre la vitesse optimale n~ 2 / 5 , ni meme s'en approcher. En 
effet, (1.18) implique que a n — o (l/n 1 / 3 ), et que la vitesse de convergence de f n (t) est o (l/n 1 / 3 ). 

Cette these ameliore les resultats de Cheng (2005). En effet, nous verrons que sous des 
hypotheses convenables, les estimateurs que nous proposerons pour estimater la loi / des residus 
pourront atteindre la vitesse de convergence n~ 2 ' 5 pour dim(X) < 2, ou dim(X) designe la dimen- 
sion de la variable explicative X. 



Chapitre 2 

Contribution de la these 



2.1 Introduction 

La revue de la litterature faite au Chapitre 1 montre que la plupart des auteurs cites 
precedemment ont utilise les residus estimes pour construire un estimateur d'une distribution de 
l'erreur. Mais aucun d'entre eux ne s'est attache a etudier l'impact de la dimension de la variable 
explicative sur l'estimateur de la loi / des erreurs, ni d'evaluer 1'irrfluence de la fenetre de premiere 
etape (utilisee pour estimer la fonction de regression) sur l'estimateur final de la densite des residus. 
La these s'attachera done a evaluer l'impact de la dimension de la variable X sur l'estimation de 
la densite /. Nous tenterons egalement de determiner les vitesses de convergence ponctuelle des 
estimateurs nonparametriques de /. Un de nos objectifs majeurs sera aussi de caracteriser les facons 
optimales de choisir les fenetres de premiere et deuxieme etapes utilisees pour estimer /. 

Nous donnons maintenant une brieve presentation de nos resulats qui seront etablis dans 
les deux prochains chapitres de la these. 

2.2 Estimateur conditionnel nonparametrique de la densite 
des residus 

Pour mieux illustrer l'effet de la dimension de la variable explicative X sur l'estimation 



de la densite / des residus du modele de regression (1.1 1, nous considerons d'abord une methodc 
naive d'estimation de / basee sur la relation 

f(e\x) = ip (m(x) + e\x) , 

oil f(-\x) et tp(-\x) designent respectivement les densites de e et Y sachant que X = x. En utilisant 
l'independance de X et e, on a done 



/( e ) = f( € \ x ) = V {m{x) + e\x) 
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Suivant cette idee, on peut done deduire un estimateur de /(e) a partir d'une estimation de ip(y\x) 
et de m{x). Par consequent, un estimateur f n (e\x) de /(e) est defini par 

1 IS ( X t -x\ ts I Yi-m n (x)-e\ 

Jn\ e \X) / \ j 

__1 x^n v j Xj-x \ 

ou h , hi et bi designent des fenetres positives, K et K\ sont des fonctions noyaux definies 
respectivement sur R d et R, et fh n (x) l'estimateur de Nadaraya- Watson (1964) de m(x) defini par 



ou 6o est une fenetre positive. Le theoreme suivant, qui sera demontre dans la suite de cette these, 
permet de mieux illustrer l'effet ncgatif de la dimension de X sur le comportement asymptotique 
de l'estimateur f n (e\x). 

Theoreme 2.1. Considerons 

d 2 ip(x,m(x) + e) f T dV (a;, m(a;) + e) f 2 
A»i(a;,e) = ^ y z.Ko(z),z dz, /U 2 (a;,e) = ^ J v Ki(v)dv, 

et supposons quebo, ho eth\ decroissent vers e£ satisfont nh}f / Inn — > oo, ln(l//i )/ hi(hm) — > oo 

naj^c (^)(^)=»(D. 

lorsque n — »■ oo. ,4k>rs sous des conditions de regularite sur m, g, ip, K and K\, on a 
\fn~f4h~! (Jn{e\x) f n {e\xj) A N (o, ^ 1 1 X 2 (z)i^(«)^ , 

/„(e|.x) = /(e^) + + +0^ + ftj . 

Le resultat de ce theoreme suggere que pour la normalite asymptotique de l'estimateur f n (e\x), 
les fenetres optimales ho et hi sont celles qui minimisent le developpement quadratique moyenne 
asymptotique 



AMSE 



(fn(e\xj) 



hoLii{x,e) h\n 2 {x,e) 



f(e\x)jKi(z)dzjKUv)dv 
nh^hig(x) 



2g(x) 2g(x) 

Un simple calcul montre que les fenetres optimales ho et hi sont toutes de l'ordre de n~ 1 ^ d+5 \ 
conduisant a une vitesse de convergence optimale n^ 2 ^ d+5 ^ pour l'estimateur f n (e\x). Par consequent, 
dans le cas ou d = 1, cette vitesse de convergence est de l'ordre de n~ 2 / 3 , ce qui est pire que la 
vitesse optimale n~ 2 / 5 atteinte dans le cadre de l'estimation d'une densite univariee. Pour la vitesse 
optimale de l'estimateur d'une densite univariee, on pourra consulter, par exemple, les ouvrages de 
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Bosq and Lecoutre (1987), Scott (1992), Wand and Jones (1995). On note egalement que l'expo- 
sant 2/(d + 5) decroit vers lorsque d devient de plus en plus grand. Cette situation illustrc done 
l'impact negatif de la dimension de X sur la performance (au sens de la vitesse de convergence 
optimale) de l'estimateur f n (e\x). C'est le probleme du "fleau de la dimension". Ce probleme est 
du au conditionnement par x dans Pexpression /(e) = f(e.\x) = ip (m(x) + e\x), ou Ton identifie la 
densite non conditionnelle /(e) a la densite conditionncllc /(e|x) sous l'hypothcsc d'independance 
de s et X. II convient egalement d'ajouter que si on voulait utiliser l'estimateur f n (e\x), il fau- 
drait resoudre le probleme du choix de x. En effet, meme si la densite /(e) ne depend pas de x, 
l'estimateur f n (e\x) en depend. 

Pour palier ce probleme du "fleau de la dimension", il faut done "deconditionner" dans 
l'expression ci-dessus de /(e). Deux approches sont alors proposees dans la suite cette these. Ces 
approches sont resumees dans les deux sections suivantes. 

2.3 Estimation de la densite de l'erreur par utilisation des 
residus estimes 

Cette premiere approche consiste, dans un premier temps, a estimer nonparametriquement 



les residus du modele ( 1 . 1 1 par 

Si uii nj i — 1, . . . , tl 7 

ou fhin = fhi n (Xi) designe le "leave-one out" estimateur a noyau de m(Xi) defini par 

Sfc ( V , V ) 

Dans un deuxieme temps, on utilise ces residus estimes, comme si e'etait les vrais, pour construire 
un estimateur nonparamctrique de /(e). Cette construction tient compte du fait que les fh n (Xi) 
peuvent etre des estimateurs biaises des m(Xi) lorsque les variables Xi sont tres proches des bords 
de leur support X . Par consequent, l'estimateur de /(e) est construit en prenant les observations 
Xi dans un ensemble ouvert Xq interieur a X . L'estimateur de /(e) est done defini par 

Ue) - >^uH^m § 1 {x ' £ *' Kl ' 

En principe, on peut supposer que X est suffisamment proche de X de telle sorte que /i n (e) 
se rapproche considerablement de l'estimateur "classique"^™ =1 K ((e^ — e)/b\) /(n&i). Neanmoins, 
dans la suite de cette these, nous considererons un sous-ensemble fixe X , pour des raisons de 
commodite. Notons aussi que l'estimateur /i n (e) ne depend d'aucun parametre inconnu, comme 
desire dans la pratique. Ceci contraste avec l'estimateur ideal nonparametrique 
~ 1 " / — 
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qui depend en particulier des residus non observes £». Cet estimateur /i n (e) est tres proche de 
l'estimateur /i„(e), comme le suggere le theoreme suivant. 

Theoreme 2.2. Supposons que bo and b\ decroissent vers telles que ln(l/&o)/ ln(lnn) — > oo, 
nbg I Inn — > oo, cT = supjd + 2, 2d}, et rS d+ ^b 7 ^ d+ ^ — > oo lorsque n — > oo. ^Zors sous certaines 
conditions de regularite sur m, g, f, K et K\, on a 

fm(e) - J ln {e) = P (#„(&oA)) ' , /m(e) - /(e) = Op^^MS^) + R n (b Q , h)^ ' , 



et 



AMSE(bi) = E r , 



(/m(e)-/(e))' 



n6i 



R n {boM) = bl 



(n6f)V2 ^ 



d\ 1/2 



6i W 



,d\ l/ 2 ' 



Les resultats de ce theoreme donnent une premiere idee de l'impact de l'estimation des residus sur 
l'estimateur nonparametrique de la densite /(e). 

Le theoreme suivant determine la facon optimale de choisir la fenetre de premiere etape bo- 
A notre connaissance, cet aspect n'a pas encore ete etudie dans la litterature statistique. Dans ce 
qui suit, a n x b n signifie que a n = 0(b n ) et b n = 0(a n ), e'est a dire il existe une constante C > 
telle que \a n \/C < \b n \ < C\a n \, pour n suffisamment grand. 

Theoreme 2.3. On considere la fenetre 



b*o = b*o(bi) = argmini?„(6 ,&i), 

bo 

ou la minimisation se fait sur I 'ensemble des fenetres bo satisfaisant les condtions du theoreme 
precedent. Alors la fenetre b^ verifie 



6q x max ■ 



n 2 b\) ' \n 3 b\ 



1 \ 2d+4 



et on a 



Rn(b*o,bi) x max. 



1 \ d + 4 



I \ 2d+4 



n 2 b\) 'V« 3 foL 

De ce theoreme, on deduit lc resultat suivant qui donnc les conditions pour lesquelles 
l'estimateur /i„(e) atteind la vitesse optimale n~ 2 / 5 lorsque bo — &o- 

Theoreme 2.4. On considere la fenetre 



b{ = argminf AMSE{b{) + R n {b*oM) 

bi \ 



ou 6q = &o(ki) es< definie comme dans le theoreme precedent. Alors 
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1. Pour d < 2, la fenetre b\ satisfait 

i 

1\ 5 



et on a 



[AMSEibD+RjblbD 

2. Pour d > 3, 6* satisfait 

' i ^ 23TT 

n 



1 \ 5 



et on a 

(AMSE(bl) + R n (b* Q ,bX) 



1 ^ 2d+l 

n . 



Ces resultats montrent que pour d < 2, la vitesse de convergence de la difference /i ra (e)— /(e) 
est d'ordre n -2 / 5 , ce qui correspond a la vitesse de convergence optimale dans le cas de l'estimation 
de la densite d'une variable univariec. Done dans ce cas, il ya un impact positif de l'estimation des 
residus sur l'estimateur de /(e). Mais pour d > 3, la vitesse le taux de convergence n -2 / 5 ne peut 
pas etre atteinte avec l'estimateur /i„(e). 

Nous obtenons egalemcnt le rcsulat de normalitc asymptotiquc suivant. 

Theoreme 2.5. Supposons que 

nb d + A = 0(1), nb^bi = o(l), nb d b\ -> oo, 
lorsque n tend vers oo. Alors sous des conditions de regularite, on a 

V^i (fm(e) / ln (e)) A N (o, p{ ^ Xo) J , 

oil 

U2 



fm(e) - /(e) + |/ (2) (e) / v* Kl (v)dv + o (bj) . 



La deuxieme approche utilisee pour l'estimation de la densite / est resumee dans la sous- 
section suivante. 

2.4 Estimation de la densite de l'erreur par integration d'une 
loi conditionnelle 

Cette approche consiste d'abord a remarquer que 

/(e) = / ip (e + m(x)\x) g(x)dx = I ip (x , e + m(x)) dx , 



oil g designe la densite marginale de X, et ip(-, ■) la densite conjointe du couple (X, Y). Cette 
formule suggere done d'estimer, dans un second temps, /(e) par 



/2n(e) = / fin (x,e + m n (x))dx, 
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oil fh n (x) designe Festimateur a noyau de Nadaraya- Watson (1964) de m(x), et (p n l'estimateur 
nonparametrique de ip. Ces estimateurs sont definis comme suit. On considere des fenetres b a = 
bo(n) et b\ = b\(n) associees a la variable X, et une fenetre h = h(n) associe a la variable Y. On 
suppose que K n et K\ sont des fonctions noyaux definis dans R d , et que K 2 designe une fonction 
noyau defini dans R. Pour tout (x,y) £ R d x I, les estimateurs m n {x) et (p n (x,y) sont definis par 



m n (x) = 



r: ,/-(\ -) ' 



i 

1 '1=1 



On considere egalement 



f2n(e) 



J Vn(x, 



Xi — x \ T ^ (Yi—y 
— ) K2 



e + m(x)) dx, 



l'estimateur par de / base sur la fonction de regression m. Avec l'aide de ces estimateurs, on obticnt 
d'abord lc theoreme suivant. 



Theoreme 2.6. On suppose que b , bi et h decroissent vers telles que ln(l/&o)/ ln(lnn) —¥ oo, 
bi/(nbfy = 0(b 2 Q p ), p e [0,6], nbf oo et n (d+s) h 7(d+A) _^ ^ i orsque n _> ^ orS; 

sows ties 

conditions de regularity sur g, m, f, ip, et Kj, j = 0,1,2, on a 

1/2 

hn{e) - f(e) = Op \AMSE{b u h) + RT n (b ,h, h) 



ou 



et 



AMSE(b 1 ,h)=E % 



(/ 2 „(e)-/(e) 



O r [b\ + h 4 



1 

nbi 



RT n (b ,h,h) = b 4 +(b d Vbi) 

+ {b d v bi) 



1 



nbfh 
1 



n6g 



nbfh 5 



1 








1 




n 2 bl d h 3 


1 \ 3 







En se basant sur ce theoreme, on retrouve des resultats similaires a ceux obtenus avec l'es- 
timateur /i n (e), notamment ceux relatifs aux choix optimaux des fenetres de premiere et deuxieme 
etape pour l'estimation de /(e). 

• Choix optimal de la fenetre bo 

Theoreme 2.7. On pose bo = b\, puis on considere la fenetre 



b*o = b* Q {h) = a,TgminRT n (b ,bo,h), 

bo 
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oil la minimisation se fait sur I 'ensemble des fenetres bo satisfaisant les hypotheses du theoreme 
precedent. Alors &q verifie 



bt, x max • 



^ \ d + 4 / 1 \ 2d + 4 



n 2 h 3 J ' V n 3 h 7 



et on a 

\ | ( \ \ 3+4 f \ \ 

RT n (b* Ql &Q, h) x - + max ■ 



2 ft 3 / ' V n 3 ft 7 



n \ \ n 

• Choix optimal de la fenetre h 

Theoreme 2.8. On considere la fenetre 

h* = aigmm^AMSE(b* , h) + RT n (b* , b*, ft) j . 

oil 6*, = b^ (ft) est definie comme dans le theoreme precedent. Alors 
1. Pour d < 2, la fenetre ft* verifie 

ft* 



et on a 

(AM SE(b* , ft* ) + RT n (b* ,h*,h*)Y x 
2. Pour d> 3, h* satisfait 

„ 3 
1 \ 2d+l 



ft* 

ei on a 



AMSE(b* ,h*) + RT n (b*,b*,h*) 



I \ 2d+l 

n , 



La conclusion des resulats de ce theoreme est la meme que celle du theoreme similairc 
obtenu avec l'estimateur /i„(e). 

• Normalite asymptotique 
Theoreme 2.9. Supposons que 

nb d + i = 0{l) 1 nb^h^o(l), r^ft 3 ^oo, 
lorsque n — »■ oo. Alors sous certaines conditions de regularite on a, 



avec 



ft (hn(e) - / 2 „(e)) A N (o, /(e) | K*(v)dv^ , 



-f / „ . . b 2 . f ^ . d 2 w(x. e + mix)) , /" , N t , 

/ 2n (c) = /(<0 + f 1 (a: G *) yV ^ 1 " rfs y zifi(z)z T dz 

/"^ / ,,n d 2 w(x, e + mix)) , /" 9 T ^ . . , ,,, 
+ y y 1 (a; e #) ^ V ' — — cfe y v 2 K 2 (v)dv + o (ft 
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Pour finir la these, nous realiserons des simulations numeriques pour valider et mieux mettre 
en exergue les resultats obtenus avec les estimateurs fi n et fm- Nous comparerons les performances 
de ces estimateurs en terme d'erreurs quadratiques moyennes globales et locales. Nous presenterons 
egalement des perspectives de recherche pour nos futurs travaux. 



Chapitre 3 

Nonparametric kernel estimation 
of the probability density function 
of regression errors using 
estimated residuals 



Abstract : In this chapter we deal with the nonparametric density estimation of the 
regression error term assuming its independence with the covariate. The difference between the 
feasible estimator which uses the estimated residuals and the unfeasible one using the true residuals 
is studied. An optimal choice of the bandwidth used to estimate the residuals is given. We also 
study the asymptotic normality of the feasible kernel estimator and its rate-optimality. 



3.1 Introduction 

Consider a sample (X, Y), {X\, Yi), . . . , (X n ,Y n ) of independent and identically distributed 
(i.i.d) random variables, where Y is the univariate dependent variable and the covariate X is of 
dimension d. Let m(-) be the conditional expectation of Y given X and let e be the related regression 
error term, so that the regression error model is 

Y i = m(X i ) + e i , i = l,...,n. (3.1.1) 

We wish to estimate the probability distribution function (p.d.f) of the regression error term, /(■), 
using the nonparametric residuals. Our potential applications are as follows. First, an estimation 
of the p.d.f of e is an important tool for understanding the residuals behavior and therefore the 



fit of the regression model (3.1.1). This estimation of /(•) can be used for goodness-of-fit tests of 
a specified error distribution in a parametric regression setting. Some examples can be founded in 
Loynes (1980), Akritas and Van Keilegom (2001), Cheng and Sun (2008). The estimation of the 
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density of the regression error term can also be useful for testing the symmetry of the residuals. 
See Ahmad et Li (1997), Dette et al. (2002). Another interest of the estimation of / is that it can 
be used for constructing nonparametric estimators for the density and hazard function of Y given 
X, as related in Van Keilegom and Veraverbeke (2002). This estimation of / is also important 
when are interested in the estimation of the p.d.f of the response variable Y. See Escanciano and 
Jacho-Chavez (2010). Note also that an estimation of the p.d.f of the regression errors can be useful 
for proposing a mode forecast of Y given X = x. This mode forecast is based on an estimation of 
m(x) + argmin eeR /(e). 

Relatively little is known about the nonparametric estimation of the p.d.f and the cumula- 
tive distribution function (c.d.f ) of the regression error. Up to few exceptions, the nonparametric 
literature focuses on studying the distribution of Y given X. See Roussas (1967, 1991), Youndje 
(1996) and references therein. Akritas and Van Keilegom (2001) estimate the cumulative distri- 
bution function of the regression error in heteroscedastic model. The estimator proposed by these 
authors is based on a nonparametric estimation of the residuals. Their result show the impact of the 
estimation of the residuals on the limit distribution of the underlying estimator of the cumulative 
distribution function. Miiller, Schick and Wefelmeyer (2004) consider the estimation of moments of 
the regression error. Quite surprisingly, under appropriate conditions, the estimator based on the 
true errors is less efficient than the estimator which uses the nonparametric estimated residuals. 
The reason is that the latter estimator better uses the fact that the regression error e has mean 
zero. Efromovich (2005) consider adaptive estimation of the p.d.f of the regression error. He gives a 
nonparametric estimator based on the estimated residuals, for which the Mean Integrated Squared 
Error (MISE) attains the minimax rate. Fu and Yang (2008) study the asymptotic normality of the 
estimators of the regression error p.d.f in nonlinear autoregressive models. Cheng (2005) establishes 
the asymptotic normality of an estimator of /(•) based on the estimated residuals. This estimator 
is constructed by splitting the sample into two parts : the first part is used for the construction of 
estimator of /(•), while the second part of the sample is used for the estimation of the residuals. 

The focus of this chapter is to estimate the p.d.f of the regression error using the estimated 
residuals, under the assumption that the covariate X and the regression error e are independent. 
In a such setup, it would be unwise to use a conditional approach based on the fact that /(e) = 
f(e\x) — ip(m(x) + e\x), where ip(-\x) is the p.d.f of Y given X = x. Indeed, the estimation of 
m(-) and (p(-\x) are affected by the curse of dimensionality, so that the resulting estimator of /(•) 
would have considerably a slow rate of convergence if the dimension of X is high. The approach 
proposed here uses a two-steps procedure which, in a first step, replaces the unobserved regression 
error terms by some nonparametric estimator £j. In a second step, the estimated ej's are used to 
estimate nonparametrically /(•), as if they were the true e^'s. If proceeding so can circumvent the 
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curse of dimensionality, a challenging issue is to evaluate the impact of the estimated residuals on 
the final estimator of /(•). Hence one of the contributions of our study is to analyze the effect of 
the estimation of the residuals on the regression errors p.d.f. Kernel estimators. Next, an optimal 
choice of the bandwidth used to estimate the residuals is given. Finally, we study the asymptotic 
normality of the feasible Kernel estimator and its rate-optimality. 

The rest of this chapter is organized as follows. Section 3.2 presents ours estimators and 
proposes an asymptotic normality of the (naive) conditional estimator of the density of the regres- 
sion error. Sections 3.3 and 3.4 group our assumptions and main results. The conclusion of this 
chapter is given in Section 3.5, while the proofs of our results are gathered in section 3.6 and in an 
appendix. 

3.2 Some nonparametric estimator of the density of the re- 
gression error 

To illustrate the potential impact of the dimension d of the X^s, let us first consider a 
naive conditional estimator of the p.d.f /(•) of the regression error term e. Let tp(-\x) and f(-\x) be 
respectively the p.d.f. of Y and e given X = x. Since f(t\x) = cp(m(x)+e\x), using the independence 
of X and e gives 

f(e) = f(e\x) = l p(m(x) + e\x). (3.2.1) 



Consider some Kernel functions Kq, K\ and some bandwidths &o, ho and h\. The expression (3.2.1 1 
of / suggests to use the Kernel nonparametric estimator 

where fh n (x) is the Nadaraya- Watson (1964) estimator of m(x) defined as 



£3U*o(^) 

The first result presented in this chapter is the following proposition 



Proposition 3.1. Define 

d 2 ip (x, mix) + e) f . T d 2 (p (x, m(x) + e) f 2 
(il(x,e) = J zK {z)z l dz, ni{x,e) = ^ J v^^dv, 

and suppose that ho decrease to such that nhQ d /\nn — > oo, ln(l//i )/ ln(lnn) — > oo and 

(Ao): nAfo-Kx,, ^\ U + = o(l) , 
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when n — > oo. Then under Assumptions (A\) — (^4io) given in the next section, we have 

\fnhih (/„(e|x) - J n (e\xj) 4 Af f 0, J J K$(z)K 2 (v)dzdv) , 

where g(-) is the marginal density of X and 

7n(ew = + + %M + o {h i + hi) . 

2g(x) 2g(x) 

This results suggests that an optimal choice of the bandwidths ho and hi should achieve the 
minimum of the asymptotic mean square expansion first order terms 



AMSE lf n (e\x)\ = 



h%(i,i(x,e) hlfj, 2 (x,e) 



f(e\x)jKUz)dzjK!(v)dv 
nh^h x g(x) 



2g(x) 2g(x) 

Elementary calculations yield that the resulting optimal bandwidths ho and h\ are all proportional 
to n~ 1 /( d+5 \ leading to the exact consistency rate n^ 2 ^ d+5 ^ for f n (x\e). In the case d = 1, this 
rate is ti -1 / 3 , which is worst than the rate n~ 2 / b achieved by the optimal Kernel estimator of an 
univariate density. See Bosq and Lecoutre (1987), Scott (1992), Wand and Jones (1995). Note also 
that the exponent 2/(d + 5) decreases to with the dimension d. This indicates a negative impact 
of the dimension d on the performance of the estimator, the so-called curse of dimensionality. The 
fact that f n (e\x) is affected by the curse of dimensionality is a consequence of conditioning. Indeed, 



(3.2.1) identifies the unconditional /(e) with the conditional distribution of the regression error 



given the covariate. 

To avoid this curse of dimensionality in the nonparametric kernel estimation of /(e), our 
approach proposed here builds, in a first step, the estimated residuals 

Si — Yi - fhi n , i = l,...,n, (3.2.3) 



(3.2.4) 



where rhi n = mj n (Xj) is a leave-one out version of the Kernel regression estimator (3.2.21 

It is tempting to use, in a second step, the estimated as if they were the true residuals £j. This 
would ignore that the m n (JQ)'s can deliver severely biased estimations of the m(Xj)'s for those Xj 
which are close to the boundaries of the support X of the covariate distribution. To that aim, our 
proposed estimator trims the observations Xi outside an inner subset Xq of X , 

1 ,„ v „ / Si - e 



This estimator is the so-called two-steps Kernel estimator of /(e). In principle, it would be possible 
to assume that Xq grows to X with a negligible rate compared to the bandwidth b\. This would 
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give an estimator close to the more natural Kernel estimator J27=i K ({£i — /{nb\). However, 

in the rest of the paper, a fixed subset X will be considered for the sake of simplicity. 

Observe that the two steps Kernel estimator /i«(e) is a feasible estimator in the sense that 
it does not depend on any unknown quantity, as desirable in practice. This contrasts with the 
unfeasible ideal Kernel estimator 



which depends in particular on the unknown regression error terms. It is however intuitively clear 
that /in(e) and /i n (e) should be closed, as illustrated by the results of the next section. 

3.3 Assumptions 

The following assumptions are used in our mains results. 

(Ai) The support X of X is a compact subset of R d and X is an inner closed subset of X with 
non empty interior, 

(A 2 ) the p.d.f. g(-) of the i.i.d. covariates X,Xi is strictly positive over X , and has continuous 
second order partial derivatives over X , 

(A3) the regression function m(-) has continuous second order partial derivatives over X , 

(A 4 ) the i.i.d. centered error regression terms e, £j 's, have finite 6th moments, and are independent 
of the covariates X, 's, 

(A 5 ) the probability density function /(•) has bounded continuous second order derivatives overR 
and satisfies, for h p (e) = e p f(e), sup eeR |ft.p^( e )l < 00, p e [0,2], k e [0,2], 

(A 6 ) the p.d.f ip of (X,Y) has bounded continuous second order partial derivatives overR d x R, 
(A7) the Kernel Kq is symmetric, continuous overR d with support contained in [— l/2,l/2] d and 
jK a (z)dz = l, 

(Ag) the Kernel K\ has a compact support, is three times continuously differ entiable over R, and 
satisfies jK\{v)dv = 1 and JvK\(v)dv = 0, 

(A 9 ) the bandwidth b decreases to and satisfies, for d* = sup{rf + 2, 2d}, nbf, /Inn — > 00 and 
ln(l/6o)/ ln(lnn) — > 00 when n — > 00, 

(A10) the bandwidth bi decreases to and satisfies n( d+8 '6^ d+4 ^ — > 00 when n — > 00. 

Assumptions (A 2 ), (A 3 ), (A 5 ) and (A 6 ) impose that all the functions to be estimated nonpa- 
rametrically have two bounded derivatives. Consequently the conditions JzK (z)dz = and 
JvKi(v)dv — 0, as assumed in (A 7 ) and (A 8 ), represent standard conditions ensuring that the 




(3.2.6) 
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bias of the resulting nonparametric estimators (3.2.2) and (3.2.6) are of order 6q and b\. Assump- 
tion (A4) states independence between the regression error terms and the covariates, which is the 



main condition for (3.2.1) to hold. The differentiability of K\ imposed in (Ag) is more specific to 
our two-steps estimation method. Assumption (As) is used to expand the two-steps Kernel esti- 



mator fi n in (3.2.5) around the unfeasible one f\ n from (3.2.6), using the residual error estimation 
£j — £j's and the derivatives of K\ up to third order. Assumption (Ay) is useful for obtaining the 



uniform convergence of the regression estimator fh n defined in ( 3.2. 2\ (see for instance Einmahl 
and Mason, 2005), and also gives a similar consistency result for the leave-one-out estimator fhi n 



in (3.2.4). Assumption (A w ) is needed in the study of the difference between the feasible estimator 
fin and the unfeasible estimator 



3.4 Main results 

This section is devoted to our main results. The first result we give here concerns the 
pointwise consistency of the nonparamatric Kernel estimator f\ n of the density /. Next, the optimal 
first-step and second-step bandwidths used to estimated / are proposed. We finish this section by 
establishing an asymptotic normality for the estimator fi„. 

3.4.1 Pointwise weak consistency 

The next result gives the order of the difference between the feasible estimator and the 
theoretical density of the regression error at a fixed point e. 

Theorem 3.1. Under (Ai) — (A$) and (Aj) — (Aiq), we have, when bo and b\ go to 0, 

1/2 

fm(e) - /(e) = O v IaMSE^) + i^&o, h) 



where 
and 

Rn(b Q M) = bt + 



AMSE(b\) =E„ 



(fm{e)-m 



= Op 6f 



nbi J ' 



(nb\y/ 2 \b\ 



d\ I/ 2 



nb* 



h \bl 



The result of Theorem 3.1 is based on the evaluation of the difference between /i n (e) and /i ra (e)- 
This evaluation gives an indication about the impact of the estimation of the residuals on the 
nonparametric estimation of the regression error density. 
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3.4.2 Optimal first-step and second-step bandwidths for the pointwise 
weak consistency 

As shown in the next result, Theorem |3.2| gives some guidelines for the choice of the optimal 
bandwidth b used in the nonparametric regression errors estimation. As far as we know, the choice 
of an optimal 6o has not been addressed before. In what follows, a n x b n means that a n — 0(b n ) 
and b n = 0(a n ), i.e. that there is a constant C > such that \a n \/C < \b n \ < C\a n \ for n large 
enough. 

Theorem 3.2. Suppose that (A\) — (Ac,) and (A?) — (Aiq) are satisfied, and define 

K = = argmini?„(6 ,6i)- 

bo 

where the minimization is performed over bandwidth &o fulfilling (Ag). Then the bandwidth b$ 
satisfies 



and we have 




Our next theorem gives the conditions for which the estimator /i„(e) reaches the optimal 
rate n~ 2 ^ 5 when bo takes the value b^. We prove that for d < 2, the bandwidth that minimizes the 
term AM SE(b\) + R n (bQ,b\) has the same order as n -1 / 5 , yielding the optimal order n -2 / 5 for 
(AMSE(h) + R n {blM)) 1/2 '■■ 



Theorem 3.3. Assume that (Ai) — (A^) and (At) — (Aiq) are satisfied, and set 



b\ = argmin AMSE(bx) + R^^IM) 



where 6q = &q(^i) * s defined as in Theorem 3.2 Then 
1. For d < 2, the bandwidth b\ satisfies 



b\ 



and we have 



2. For d > 3, b\ satisfies 



and we have 



(AMSEibD + Rnib^bl) 



b\ 



I ^ 2d+ll 

n 



[AMSEibD + Rn (b^bl 



1 ^ 2d+ll 

n , 
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The results of Theorem 3.3 show that the rate n 2 / 5 is reachable if and only when d < 2. These 



results are derived from Theorem 3.2 This latter indicates that if b\ is proportional to n 1 > 5 , the 




bandwidth b* has the same order as 

' 1 \ 5(2d+4) 

n) 

For d < 2, this order of 6g is smaller than the one of the optimal bandwidth bo* obtained for 
pointwise or mean square estimation of ro(-) using a Kernel estimator. In fact, it has been shown 
in Nadaraya (1989, Chapter 4) that the optimal bandwidth bo* for estimating m(-) is obtained by 
minimizing the order of the risk function 

r n (b )=E J l(x £ X) (m n (x) - m(x)) 2 gl(x)w(x)dx , 

where g n (x) is a nonparametric Kernel estimator of g{x), and w(-) is a nonnegative weight function, 
which is bounded and squared integrable on X . If g(-) and m(-) have continuous second order partial 
derivatives over their supports, Nadaraya (1989, Chapter 4) shows that r n (bo) has the same order 
as 6q + (1/(?t,6q)), leading to the optimal bandwidth bo = rt~ 1 /( d + 4 ) f or the convergence of the 
estimator m n (-) of ro(-) in the set of the square integrable functions on X . 

For d=l, the optimal order of 6q is n - ^/ 5 - 1 **- 4 / 3 ) which goes to slightly faster than n -1 / 5 , the 
optimal order of the bandwidth b for the mean square nonparametric estimation of m(-). 

For d = 2, the optimal order of &q is n -1 / 5 . Again this order goes to faster than the order n" 1 / 6 
of the optimal bandwidth for the nonparametric estimation of the regression function with two 
covariates. 

However, for d > 3, we note that the order of 6q goes to slowly than bo. Hence our results show 
that optimal m n (-) for estimating /(•) should use a very small bandwidth bo- This suggests that 
m„(-) should be less biased and should have a higher variance than the optimal Kernel regression 
estimator of the estimation setup. Such a finding parallels Wang, Cai, Brown and Levine (2008) 
who show that a similar result hold when estimating the conditional variance of a heteroscedastic 
regression error term. However Wang et al. (2008) do not give the order of the optimal bandwidth 
to be used for estimating the regression function in their heteroscedastic setup. These results show 
that estimators of m(-) with smaller bias should be preferred in our framework, compared to the 
case where the regression function m(-) is the parameter of interest. 

3.4.3 Asymptotic normality 

We give now an asymptotic normality of the estimator /i ra (e). 
Theorem 3.4. Assume that 

(An): nb d + A = 0(1), nbfoi = o(l), nb%b\ -> oo, 
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when n goes to oo. Then under (Ai) — (A§), (A7) — (Aiq), we have 



1 nbi 



(/-(e)-7 1 „(e))4^(0 )]p ^^/^(^), 



where 



/l„(e)=/(0 + f/ (2) (e) / v^K^dv + oibl) 



The result of this theorem shows that the best choice b\ for the bandwidth b\ should achieve the 
minimum of the Asymptotic Mean Integrated Square Error 



AMISE 



b\ 



(/ (2) W) ; 



dc 



v 2 K 1 (v)dv + 



1 

n6iP(A e Xq) 



K 2 (v)dv, 



leading to the optimal bandwidth 



61 = 



K 2 (v)dv 



(f {2 \e)) 2 de [J^K^dv} 



1/5 



-1/5 



We also note that for d < 2, b\ — b\ and b — b^, Theorems |3.3| and |3.2| give 

bi > 

which yields that 









)'■ 


60 ~ 


(I) 









nbf 



I ^ 5(2d+4) 

n 



I ^ 5(2d+4) 

n 



, nbth X ( ~ ) ' ' , nb d bl 



I ^ 5(2d+4) 

n , 



This shows that for d = 1, the Assumption (An) is realizable with the optimal bandwidths bg and 
b^. But with these bandwidths, the last constraint of (An) is not satisfied for d = 2, since nb^b\ 
is bounded when n — > 00. 
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3.5 Conclusion 

The aim of this chapter was to study the nonparametric Kernel estimation of the probability 
density function of the regression error using the estimated residuals. The difference between the 
feasible estimator which uses the estimated residuals and the unfeasible one using the true residuals 
are studied. An optimal choice of the first-step bandwidth used to estimate the residuals is also 
proposed. Again, an asymptotic normality of the feasible Kernel estimator and its rate-optimality 
are established. One of the contributions of this paper is the analysis of the impact of the estimated 
residuals on the regression errors p.d.f. Kernel estimator. 

In our setup, the strategy was to use an approach based on a two-steps procedure which, 
in a first step, replaces the unobserved residuals terms by some nonparametric estimators £j. In a 
second step, the "pseudo-observations" Si are used to estimate the p.d.f /(•), as if they were the true 
s^s. If proceeding so can remedy the curse of dimensionality, a challenging issue was to measure 
the impact of the estimated residuals on the final estimator of /(•) in the first nonparametric step, 
and to find the order of the optimal first-step bandwidth 6o- For this choice of bo, our results 
indicates that the optimal bandwidth to be used for estimating the regression function m(-) should 
be smaller than the optimal bandwidth for the mean square estimation of m( ). That is to say, the 
best estimator m„(-) of the regression function m(-) needed for estimating /(•) should have a lower 
bias and a higher variance than the optimal Kernel regression of the estimation setup. With this 
appropriate choice of 6 , it has been seen that for d < 2, the nonparametric estimator /i„(e) of / 
can reach the optimal rate n~ 2 / 5 , which corresponds to the exact consistency rate reached for the 
Kernel density estimator of real- valued variable. Hence our main conclusion is that for d < 2, the 
estimator /i n (e) used for estimating /(e) is not affected by the curse of dimensionality, since there 
is no negative effect coming from the estimation of the residuals on the final estimator of /(e). 
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3.6 Proofs section 



Intermediate Lemmas for Proposition |3.1| and Theorem |3.1 
Lemma 3.1. Define, for x £ X , 



1 ™ 



X,, - x 



> 9n( x ) = E [9n(x)} ■ 



Then under (A±) — {A 2 ), (A 4 ), (A7) and (Ag), we have, when & goes to 0, 

sup \g n ( x ) ~ 9( x )\ = O ( b o) > SU P \9n(x) - g n (x)\ = Op ( bl + ] 

1/2 



1/2 



and 



sup 

x£X 



g n (x) g(x) 



, , . In n V 



Lemma 3.2. Under (Ai) — (At), (A7) and (Ag), we ftawe 



sup |m n (x) - m(x)\ = O v ( &q + ) 



1/2 



Lemma 3.3. Define for (x, y) € E d x E, 



X)- 



fn(e\x) 



nK 



Then under (A±) — {A3), (Ag) — (Ag), we have, when n goes to infinity, 



fn(e\x) - f n (4 x ) = °P 



ih(jhi 



1/2 



Lemma 3.4. Set, for (x, y) e R d x 

tPin{x,y) = 



Then, under (A 6 ) — (Ag,), we have, for x in X and y in M, /io and /ii going to 0, and /or some 
constant C > 0, 



E [<p in (x, j/)] - <p (x, y) = 



Var [ip in (x, y)} 



hi d 2 (p(x,y) 



2 d 2 x 
+ o(hl + h\ 

'•p (x, y) f I L .2, . , ,-2 



zKg(z)z T dz + 



E 



Win (x,y) - Eip in (x,y)\^ 



< 



fifth 

C<P (x,y) 
h 2d h 2 



Ko(z)Kf(v)dvdz + o 



h\ d 2 cp(x,y) 
2 8 2 y 

1 



v 2 Ki(v)dv 



hih 



\K {z)Kx {v)\ A dzdv + o 



h 2d h\ 
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t(X t eX ) „ ( £l -e 



Lemma 3.5. Set 

ft_ (A = 

W(x ex ) V & i 

T/ien under (At), (A5) and (A3), we /lave, /or 61 .going to 0, and for some constant C > 0, 

h2 



E/ jn (e) - /(e) + |/( 2 )(e) / ^(vjdv + o (b\) , 



Var(/ m (e)) 



E|/ in ( e )-E/ m (e)| 3 < 



biP(X e #0) 
C/(e) 



Kl(v)dv + o 



1 



Lemma 3.6. Define 



S n = fl(I,^„)(m 1 „-m(X 1 p{ 1 » (^V^) 

n / 

T n = ^l(X f e * ) (m<n - m(X,)) 2 tf< 2 > ' ^ 



R n = V 1 (x< e #0) (m<n - ™pQ)) 3 / (i - *) 2 *f 
i=l J ° 



43) ( Si- t{m in - m{Xi)) - e 



bi 



Then under (A) — (A 5 ) and (A) — (Ao), we have, for b and 61 small enough, 

1/2" 
<) _ ' 



Sri 


= O 


r„ 


= O 




= O 



bl ( n 6 2 + (n6 1 ) 1/2 ) + (^ + ^) 



dt. 



Lemma 3.7. Under (A) and (A) we have, for some constant C > 0, and for any e in R and 

pe [0,2], 

2 



e — e 



e p f(e)de 



e p f(e)de 



<Cb u 

< Ch, 

< Ch, 



K[ 1] ( ^± I ePf{e)de 



bi 

12, / ' ' 



bi 

■; ( e-e 
61 



e"/(e)de 
e p /(e)* 



< Cb\, (3.6.1) 

< C6?, (3.6.2) 

< Cb\. (3.6.3) 



Lemma 3.8. Set 



1 pfr e *o) 

nb^gi., 



■ 2 (m^O-m^))^' " 



&0 
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Then, under (A\) — (A 5 ) and (A-j) — (A w ), we have, when bo and bi go to 0, 

i 

^7 



Lemma 3.9. Set 



y. 

'—'in — 



i {Xi e x ) 

nbigi, 



Then, under (A\) — (A§) and (A 7 ) — (A\q), we have 



n , 
E e iM 



Xj — Xi 



8 = 1 



(i) ( £8 -e 
&1 



<y\nb\ + % 



1/2 



Lemma 3.10. Le£ E„[-] be the conditional mean given Xi, . . . ,X n . Then under (Ai) — (A 5 ) and 
(A 7 ) — (Ag), we have, for b going to 0, 

2 



sup E„ 

Ki<n 



sup E„ 

KKn 



1 (X, G Xo) (m in - miX,)) 4 
1 (X, G # ) (mi„ - m(A 4 )) 6 



Op Uo + 



Pp 6g + 



1 



nb d 



Lemma 3.11. Assume that (A4) and (A 7 ) hold. Then, for any 1 < i ^ j < n, and for any e in R, 

(fh in - m(Xi),Si) and (fhj n - m(X J ),s j ) 
are independent given X\, . . . , X n , provided that \\Xi — Xj\\ > Cb , for some constant C > 0. 

Lemma 3.12. Let Var„(-) and Cov n (-) be respectively the conditional variance and the conditional 
covariance given X\, . . . , X n , and set 



( m = 1 (Xi G Xo) (rh m - m(X l )) I K\ 



2^(2) fc i 



Si - e 



Then under (A\) — (A 5 ) and (A 7 ) — (A 9 ), we have, for n going to infinity, 



n n / 1 

E E C0V « (C*«. On) = Or (n 2 6^I /2 ) U + — d 

8=1 3 = 1 ^ 



All these lemmas are proved in Appendix A. 
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Proof of Proposition |3.1 



Define f n (e\x) as in Lemma 3.3 and note that by this lemma, we have 



fn(e\x) = f n (e\x) + o P 



1/2 



(3.6.4) 



The asymptotic distribution of the first term in (3.6.41 is derived by applying the Lyapounov 
Central Limit Theorem for triangular arrays (see e.g Billingsley 1968, Theorem 7.3). Define for 
x € Xq and i/€K, 



1 

9n{x,y) = —rjT-y^Ko 
nh%hx ^ 

and observe that 



Xj-x 
ha 



fn(e\x) 



hi 

ip n (x,m(x) + e) 

9n{x) 



9n{x) 



Let now ipi n (x,y) be as in Lemma |3.4[ and note that 

<Pn(x,y) = ^Y^(^Pin(x,y) -E[(p in (x,y)]j + E [tpin{x, y)] 
The second and third inequalities in Lemma 3.4 give, since h^hi goes to 0, 



(3.6.5) 



(3.6.* 



T,i=i E \<Pin(x,y) - E<p in (x,y)\ £ 
(J27=i YaT i^n(x,y)]f 



< 



°^fJJ \K (z)K 1 {v)\*dzdo + o(j j ^ 



^1 jK^z)Kl{v)dvdz + o 
Hence the Lyapounov Central Limit Theorem gives, since nh^hi diverges under (Ao), 

E™=1 {<Pin(.X,y) ~ ®[<Pin(x,V)]} d 



n 



= O{h d h x ) = o(l). 



(E™=1 Var [<Pin(x,y)]) 



1/2 



AT(0,1), 



so that 



\/nh^h\ ^ 



J2($in(x,y) -E[fr n (x,y)}^ A N ^0, ip(x, y) J J K^(z)K 2 (v)dzdv^ . (3.6.7) 



Further, a similar proof as the one of Lemma 3.1 gives 

1 1 



O r ( h 4 



lnn\ 



1/2 



g n (x) g{x) 

Hence by this equality, it follows that, taking y = m{x) + e in (3.6.7), and by (3.6.4)-(3.6.6), 
\f^4hi (fn(e\x) - 7 n (e\x)) A N U f -^- J J K*(z)Kf(v)dzdv\ , 



(3.6.8) 



where 



fnWx) 



E [gin (x, m(x) + e)] 

9n(x) 



This yields the result of Proposition 3.1 since the first equality of Lemma 3.4 and (3.6.8 1 yield, for 
ho and hi small enough, 



f n {e\x) = f(e\x) + 



hi d 2 ip (x, m(x) + e) 



2g(x) d 2 x 
hi d 2 ip(x, m(x) + e) 



2g(x) 



d 2 y 



zKq(z)z t dz 
v 2 K 1 (v)dv + o(h 2 ) + h 2 1 ) .□ 
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Proof of Theorem 13.11 



The proof of the theorem is based upon the following equalities : 



/in(e) - /in(e) 



h \b\) 



i 2 b*b\ 



1/2 



Of 



1 



uds 1/2' 



and 



/i„(e) - /(e) = P 6f + 



n6i 



1/2 



(3.6.9) 



(3.6.10) 



Indeed, since 



7in(e) - /(e) = (/i»(e) - /(e)) + 7m (e) - hn(e), it 



then follows by (3.6.10) and 



(3.6.9) that 



/i„(e)-/(e) = Op 



1 



1 



^ + nb, +b4 ° + n~' v-hp; \ (,//,•')' 2 V//',' 



1/2 



Op 



'0 



'0 



1/2 



This yields the result of the Theorem, since under (Ag) and {Aiq), we have 

2 



1=0 (4- 



1 



O ^ 



Hence, it remains to prove (3.6.9) and (3.6.10). For this, define S n , R n and T n as in Lemma 3.6 



Since — Ei = — {fhi n — m(Xj)) and that K\ is three times continuously differentiable under (Ag), 
the third-order Taylor expansion with integral remainder gives 



/m(e) - /i™(e) 



1 



1 i X i e *o) 



Therefore, since 



1 / >SVi T n R n 

"iiElLi^i^o) \&T _ 2bf 26? 



£ 1 (X, e Ab) = n (P (X e X6) + pp(i)) . 



by the Law of large numbers, Lemma 3.6 then gives 



./,„(>.!- /i„U> = ( J )2 -)^ + p( r ^) r « + p(J ) 4 ) /? « 



1/2' 
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This yields (3.6.91, since under (Ag) and (^4io), we have bo — > 0, nb 



d+2 



and nb\ — > oo, so that 



6811 



1 



^6?) 1 /2 



(nbl) 1 / 2 \bf 



d\ 1/2- 



(n&f)V2 \b 3 J 



bt\ 1/2 



For (3.6.10), note that 



E, 



(/i„(e)-/(e) 



Var, 



(7m(e)) 



E. 



/m(e) - /(e) , 



(3.6.11) 



with, using (A4), 

Var„ (/i„(e)) 



(biEt^(x*£Xo)) 2 t! 

Therefore, since the Cauchy-Schwarz inequality gives 



e ^ )Var 



A', 



£ — e 



Var 



A'i 



e — e 



< E 



e — e 



this bound and the equality above yield, under (A 5 ) and (Ag), 

C 



Var n (/ lB ( £ ))< 6iEr=il - (Jfie;tb) =0. 



<6i / K 2 (v)f(e + b lV )dv, 



1 



For the second term in (3.6.11 1, we have 

1 



E„ 



fin(e) 



^1 (X t € X )E 



e — e 

~6~r 



(3.6.12) 



(3.6.13) 



&iE?=il(jsqe* )^ 

By (As), ifi is symmetric, has a compact support, with JvKi(v) = and J Ki(v)dv = 1. Therefore, 
since under (A5) / has bounded continuous second order derivatives, this yields for some 6 = 
0(e,M), 

' e — e 



E 



Ki 



+ 2 



b l J K 1 {v)f(e + b 1 v)dv 
f(e) + b 1 vf (1 \e) + ^ff^(e + 9b 1 v) 
v 2 K 1 (v)f {2) (e + 8b 1 v)dv. 



dv 



Hence this equality and ( |3.6.13[ ) give 
E 

so that 



/m(e)] =/(e) + | / « 2 Ai(«)/ (2) (e + ^i«)^, 



E.,, 



/in(c) -/(e) =Op(6J) 



Combining this result with (3.6.12) and ( |3.6.11 ), we obtain, by the Tchebychev inequality, 



/m(e) - f(e) = O v lb\ + 



nb\ 



1/2 



This proves (3.6.10), and then achieves the proof of the theorem. 



□ 
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Proof of Theorem 13.21 

Recall that 

R n %M) = bt + 
and note that 



1 



d\ 1/2 



-i 2 



lib 1 * 



1 \ d + 4 



1 \ d + 4 



&i \b 7 J 



\ \ 2<i+4 



6S- 



^26?; \W6?/ 'v* s &L 

if and only if ri i ~ d b d+1& — > oo. To find the order of 6g, we shall deal with the cases nb^ 4 



and nb. 



d+i 

'o 



0(1). 



First assume that nb l+4: — > oo. More precisely, we suppose that &o is in [(un/n) 1 "^, +oo), where 
u n oo. Since l/(n&g) = ^(6q) for all these 6q, we have 



(bt)\ 



nb d 



4\3 



Hence the order of 6 is computed by minimizing the function 

2 



bo -> &o 



1 



(btr 



i 



6 A V2' 

6{ 



W) 1 



Since this function is increasing with bo, the minimum of R n (-, b{) is achieved for 6q* = (un/n) 1 '"" 1 " 4 '. 
We shall prove later on that this choice of feo* is irrelevant compared to the one arising when 
nb d+i = 0(1). 

Consider now the case nb$ +A — 0(1) i.e 6q — O (l/(n&o)). This gives 



(n&f)i/2 V & ? 



6*> 


1/2- 


2 




1 




) _ 






" nb^ 




1/2" 


2 




1 




) 









1 



Moreover if nb^bf — > oo, we have, since n&o ~~ ► 00 under (A9) 



1 


4) 










1 


!) 




b{ 




(n 3 6 d ) 



7i d 
°0 



1 



°0 



1 



1 , 6g 



1 



■ ud 

= o> b ° 



1 



n 2 &2rf 



v n6f b\)\n 2 b 2d ) h \ \n 2 b 2d J ' V b ? & iA™ 3 &o d / 
Hence the order of 6^ is obtained by finding the minimum of the function b^ + (l/n 2 6^). The 

minimization of this function gives a solution bo such that 

1 

I \ d+i / 1 \ d + 4 



Rn{bo,h) 



n 2 b\) n ' \n 2 b\ 

This value satisfies the constraints «.6q +4 = 0(1) and nb^bf — > 00 when ?i 4_d &^ +16 

00, 

1 /' 1 \ A'! \ / 1 \ 1 / 1 \ „ fbA\ 1 1 



If now n^+ 4 = 0(1) but nbfof = 0(1), we have, since n6g d 



nb\ \n 2 b 2 ) d 



.61/ U 3 ^/ 6? U 3 6 V W W a &§V Ul 
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In this case, 6q is obtained by minimizing the function b$ + (l/ 71 - 3 ^^), f° r which the solution 60 
verifies 

/ 1 \ spfs / 1 

This solution fulfills the constraint nbfibf = 0(1) when n 4 - d bf +16 = 0(1). Hence we can conclude 
that for 6q = O (1/(ti&q)), the bandwidth satisfies 



6q x max 



1 \ d + 4 / 1 



2 b\) ' \n 3 bl 



which leads to 

R n (65, h) x max j (-^) , ^ J . 

We need now to compare the solution b$ to the candidate &o* = (w„/n) 1 /( <i+4 ^ obtained when 
7j6q +4 — > 00 . For this, we must do a comparison between the orders of R n (pQ, b\) and i? n (6o*, bi). 
Since R n (bo,bi) > &q, we have i? n (6o*,&i) > (u„/n) 4 /( d+4 ), so that, for n large enough, 



R n (ba*,bi) 



I \ d+4 f I 



2 b\) \n 3 bl 



u,, 



n 

4(d + 8) 
(2rf + 4)(d + 4) 



o(l) + O ( - ) +1 I I ="(!)■ 



using u„ — > 00 and that n^ d+s ^b[ — > 00 by (Axq). This shows that R n (l>Q,bi) < i?„(&o*,^i) f° r 
n large enough. Hence the Theorem is proved, since 6q is the best candidate for the minimization 
of Rn(;h). □ 

Proof of Theorem EOl 

Recall that Theorem |3.2| gives 

AMSEQn) + R n (b* 0l h) x n(6i) + r 3 (&i) + r 3 (&i) = F(bi), 

where 

ri (/i) = /i 4 + 4": argminri(/i) X n~ 1/5 = fit, mm rUh) X (^) 4 = n~ 4/5 , 
^2(^1) = 'i 4 H 5 n - ! arg min (/i) x rt^ 3 ^ = /ig, min r 3 (/i) x (/i^) 4 = n" 3 ^ , 



d + 4 fi d+4 



r 3 (/i) = /i 4 H jj— — argmin r 3 (/i) x n aafir = /j* 5 m in r 3 (h) x (/13) 4 = n 2d+n . 

n 2d + 4/j2d + 4 

Each fj(/i) decreases on [0, arg min r •, (/;,)] and increases on (arg min Tj (h) , 00) and that x h A 

on (arg min rj(h), 00). Moreover min r 2 (/i) = o (r 3 (ft,)) and h* 2 —o (h\) for all possible dimension c?, 
so that min{r 2 (ft.) +r 3 (/i)} x (/13) 4 = rT 1 ^^ and argmin{r 2 (ft.) +r 3 (/i)} x h% = n _53 TiT. 
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Observe now that mm{r 2(h) + r 3(h)} — O (minri(ft.)) is equivalent to n~ 2d + 11 = O (n~ 4 / 5 ) 
which holds if and only if d < 2. Hence assume that d < 2. Since n~ 2d + 11 = O (n -4 / 5 ) also gives 
argmin{r 2 (/i) + r 3 (h)} x — O (hi), we have 

mmF(bi) x n~ 4 / 5 and argminF(&i) x n -1 / 5 . 

The case d > 2 is symmetric with 

minF(6i) x n" 23 ^ and argminF(6i) x n^^rn . 

This ends the proof of the Theorem. □ 



Proof of Theorem 13.41 

Observe that the Tchebychev inequality gives 

n 









a)] 







so that 



where 



Therefore 



/m(e) 







1 + Op 









/n(e) 



/n(e), 



gj - e 



fm(e) - E/ n (c) - (/„(e) - E/ n (e)) + (/ ln (e) - / ln (e)) + O p f n (e). (3.6.14) 



Let now /in(e) be as in Lemma 3.5 and note that / n (e) = (3-A 1 ) S<=i /m( e )- The second and the 



third claims in Lemma 3.5 yield, since 61 goes to under (Aiq), 



Er=i E l/»»(e)-E/m(e)l < 1|A 



(£r=iVar/ in ( £ ) r 



0(60 = 0(1). 



F(i^„) tl _/ ^i(f)d« + op- 
tionee the Lyapounov Central Limit Theorem gives, since nb\ diverges under (A10), 



/ n (e)-E/ n (e) = / n (e)-E/ n (e) ^ 
vA^ar^e) / Var/,„(e) 

V 

which yields, using the second equality in Lemma |3.5[ 

/(e) 



AT (0,1), 



nb! (/„(e) - E/„(e)) A ( 0, ro( ^ v ^ / 



(3.6.15) 



Moreover, note that for nb^bf — > 00 and ?i&Q d — » 00, 
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Therefore, since by Assumptions (An) and (Ag), we have b^ — (l/(n&o)), nb^b\ — > oo and that 
n6g d — > oo, the equality above and (3.6.9) then give 

1/2 



/ln(e) - /in(e) 



Op 



6$ 



Op I bt 



1 

- ^ 


1 


n 




1 


1 


n 


n 2 b^jbl 



1 , &g 



1 



1 , b d 



1 



1/2 



Hence for b\ going to 0, we have 

y/nbi f/in(e) - /in(e)J 



-,1/2 



op(l), 



since nb^bi = oil) and that nb^b\ — ¥ oo under Assumption (An). Combining the above result with 



(3.6.15) and (3.6.14), we obtain 



nh (/ ln (e)-E/„(e) 4 TV 



K 2 (v)dv 



This ends the proof the Theorem, since the first result of Lemma |3.5| gives 

E/„(e)=E/ ln (e) = /(e) + |/ (2) (e) f v 2 K^dv + o (b\) :=/ ln (e).n 
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Appendix A : Proof of the intermediate results 
Proof of Lemma 13.11 

First note that by (At), we have J zKo(z)dz = and jKg(z)dz = 1. Therefore, since Kq 
is continuous and has a compact support, (A±), (A%) and a second-order Taylor expansion, yield, 
for ba small enough and any x in Xq, 



\9n( x )-9(x)\ = 
K (z) 



g(z)dz-g(x) 



b g ( - 1) (x)z + -^zg^ 2 \x + 9b z)z T 



dz 



K (z) [g(x + b z) - g(x)] dz 

e = 6(x,b z) e [o,i] 



b Q g (1) (x) j zK (z)dz+-^ j zg (2) (x + 8b Q z)z T K (z)dz 



zgW(x + 9boz)z T Ko(z)dz 



<Cb 2 , 



so that 



sup \g n (x) - g(x)\ = O (b 2 Q ) 

x£X 



This gives the first equality of the lemma. To prove the two last equalities in the Lemma, note that 
it is sufficient to show that 



sup \g n (x) - g n (x) \ = O w 



Inn 



1/2 



since g n (x) is asymptotically bounded away from over Xq and that \g n (x) — g(x)\ — 0(b^) 
uniformly for x in Xq. This follows from Theorem 1 in Einmahl and Mason (2005). □ 

Proof of Lemma 13.21 

For the first equality in the lemma, set 
1 " 

and observe that 



nbi — 



X 3 - X 

bo 



r n (x) = E [r n (x)} 



sup \fh n (x) — m(x)\ < sup 



^ i n r n(x) 

m n (x) - —r- 
9„(x) 



sup 

xex \9n( x ) 



Vn(x) -g n (x)m(x)\ 



(A.l) 



Consider the first term of ( A.l ). Note that E 1 / 4 \Y A \X = x] < \m(x)\ +E 1 / 4 [e 4 ] . The compactness 
of X from (Ai), the continuity of m(-) from (A3) and (A4) then give that E [F 4 |X = x\ < 00 
uniformly for x € Xq. Hence under (Ag), Theorem 2 in Einmahl and Mason (2005) gives 



sup 

xex 



fhn(x) 



F n(x) 



9n(x) 



lnrt 
nbl 



1/2 



For the second term in ( A.l ), a second-order Taylor expansion gives, as in the proof of Lemma 3.1 



sup \r n (x) - g n (x)m(x)\ = 0(bl) 

x£X 



This gives the result of lemma since Lemma 3.1 implies that g n (x) is bounded away from over 
Xq uniformly in x and for &o small enough. □ 
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Proof of Lemma 13.31 

Note that under (As), the Taylor expansion with integral remainder gives, for any x G Xq 

and any integer i G [1, n], 

(Y i -fh n {x)-e\ (Yi-m{x)-e\ I f 1 w (Y, - 9 n (x, t) \ 
Kl { h x ) =Kl { hi )- Vi (m n (x)-m( X ))J o K{ ^ dl. 

where 9 n (x, t) = m(x) + e + t (fh n (x) — m(x)). Therefore 
m n (x) — m(x) 



fn(z\x) = fn(e\x) 



9n{x) 



h 



hi 



(A.2) 



Now, observe that if = z and y G M, the change of variable e = y — m(z) + h\v gives, under 
(A ± ) - (A 5 ) and (A 7 ), 



(i) (Yi-y 



hi 



= E 



(1) (e l + m{z) - y 



(i) / e + m(z) -y 



hi 



hi 

f(e)de 



hi / \K{ 1] (v)\f ((y - m(z) + h t v))dv < Gh x . 



Hence 



f E„ 


*f > ( 







dt < Chi. 



With the help of this result and Lemma |3TTj we have 
E, 



Xj-x \ r 1 (1) ( Yj-e n (x,t) 

ho J Jo 1 I hi 



< 1 



Ko 



Xi - x 



< — V 

~ nhi e-f 

u Z — 1 



ho 

Xi - x 



x sup / E r 

l<i<n Jo 



o P (i), 



A 



(i) ( Yj- 9 n (x,t) 
hi 



dt 



so that 



"Hohlfy °\ h JJo 1 \ hi ) \h x 



Hence from (A.2 1, ( 3.6. 8| , Lemma 3.2 and Assumption (Aq), we deduce 



./„ek> = ./>k> + ". ( /; ' ) (h: : • ^ 



Proof of Lemma 13.41 and Lemma 13.51 



1/2 



f n (e\x) + o 



1 



nhthi 



1/2 



.□ 



We just give the proof of Lemma |3.4| the proof of Lemma 3.5 being very similar. For the 



first equality of Lemma 3.4 note that 

1 



E[ip in (x,y)] = 



K ° \ h / Kl \ hi / Vixi.yijdxidy! 
Ko(z)Ki(v)(p [x + hoz, y + hiv) dzdv. 
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A second-order Taylor expansion gives under (Aq), for z in the support of Kq, v in the support of 
Ki, and ho, hi small enough, 



ip (x + h a z, y + h\v) - <p(x, y) 

dip(x,y) T d<p{x,y) 
— ho — z + hi v 



dx 



hi d 2 (p(x + 9h z,y + 9h\v) T 

~T Z Wx z 

h\ d 2 (p(x + 9h z, y + Ohiv) 2 



, , d 2 ip(x + 0h o z,y + Ohiv) T 

hih v — — z 

oxoy 



2 d 2 y 

for some 9 = 6{x,y,hoZ,hiv) in [0,1]. This gives, since jKo(z)dz = jKi(v)dv — 1, J zKo(z)dz 
and J vKi(v)dv vanish under (A-/) — (Ag), and by the Lebesgue Dominated Convergence Theorem, 

E[^ n ( X ,y)]- 9 (x,y)-^ J zKo (z)z T dz-§^^l J v 2 Ki{v)dv 



h 2 
"o 



+hihc 

4 

o(h 2 + hi) 



d 2 (p(x + 9h z,y + 6h\v) d 2 (p(x,y) 



d 2 x d 2 x 
d 2 (p(x + 9hoz, y + 0h\v) d 2 <p(x,y) 



z T K {z)Ki(v)dzdv 



dxdy 



dxdy 



z T Ko(z)Ki(v)dzdv 



d 2 ip(x + 9h z,y + 9hiv) d 2 (p(x,y) 



d 2 y 



d 2 y 



v 2 Ko (z) K\ (v) dzdv 



This proves the first equality of Lemma 3.4 The second equality in Lemma follows similarly, since 



Vax[(p in (x,y)] = E[ipf n {x,y)] - (E [ip m (x, y)]Y 
1 

J^hi 



ip{x + h z, y + hiv) Kg(z)K 2 (v)dzdv + 0(1) 



<p{y,y) 

Khi 



K 2 (z)K 2 (v)dzdv + o 



1 



h d hi 



The last statement of Lemma 3.4 is immediate, since the Triangular and Convex inequalities 



give 



E\ip in (x,y) -E(p ln (x,y)\ 3 < CE\Lp in (x,y)[ 



< 



Cy{x,y) 
h 2d h\ 



\K (z)Ki(v)\ 3 dzdv + o 1 '-' 



Proof of Lemma 13.61 



The order of S n follows from Lemma |3.8| and Lemma 3.9 In fact, since 



l(Xi G Xo) (fh in - m(X i )) = 1( - X ! d t Xo) V (m(X j )+e j -m(X i ))K^ 



Xj — Xi 
bo 



ft 



Lemma 3.8 and Lemma 3.9 



give 



Sn — Op 



(nb 2 + {nbi) 1 ' 2 



nbi+ b ^ 



1/2 



b d o) 
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which gives the result for S n . 



For T n , define for any 1 < i < n, 



E,- n [-] = En [-X"i, . . .,X n ,e k ,k ^ i] 



Therefore, since (fhi n — m{Xi)) depends only upon {X\, . . . , X n , k ^ i), we have 



E„[T„ 



E, 



E, 



(2) I ei-e 



n 

5^ 1 (Xi G Af ) (m in - m(X t )) 2 E 



A" 



with, using (A4) and Lemma 3. 7 (3. 6. 2) 

,(2) t'Si — t 



E, 



A 



6x 



A ( f! 6i f ) 



Hence this bound, the equality above, the Cauchy-Schwarz inequality and Lemma [3. 10| yield that 



|E„[T„]| < CblJ2 E n 



1 {Xi G X ) (fh in - m(Xi)f 



< Cnbi ( sup E Ti 

0-<i<n 



< 



Op (nbl) bl + 



1 {Xi G X Q ) {fh in - m{Xi)) 4 



1 



1/2 



(A.3) 



For the conditional variance of T n , Lemma 3.12 gives 

n n n 

Var„(T„) = ^ Var„ (Q n ) + ^ ^ Cov„ Cin) 



i=l 



1=1 3=1 



O v {nb x ) ( 6* ' '" 



o,(„W 2 )(^ + 4) ; 



Therefore, since b\ goes to under {A w ), this order and (A.3) yield, applying the Tchebychev 
inequality, 



T„ 



Op 
F 



K )( 4} + ^) +( „ M V 2 ( 6} + ^ +( „ W a)^( t J + ^ 



nby 



which gives the result for T n . 

We now compute the order of R n . For this, define 

I in = f '{1-tfK^ ( ^-^in-m{X i ))- < 



Ri 



t{X l G X ){m in -m(Xi)) 3 A, 
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and note that R n = X)"=i Rin- The order of R n is derived by computing its conditional mean and 
its conditional variance. For the conditional mean, observe that 



E n [R n ] = E n 



E, 



J2 E in [R in \ 
.i=l 
n 

1 {*i G X ) (m m - m(X t )) 3 E m [I in ] 



with, using (A4) and Lemma 3.7 (3.6.3) 



|E m [I m }\ 



(1-tf 



K 



(3) ( e - t(m in - m(Xj)) - e 



f(e)de 



dt 



< Cb\. 

Therefore the Holder inequality and Lemma [3 . 1 0| yield 

n 

|E„ [Rn]] < Cb\ E„ [|1 {Xi € Xo) (m m - m(Xi))\ 3 
»=i 

n 

< Cb\ [l (-Xi G Af ) (m m - m(JSQ))' 

< + ^ 



3/2 



For the conditional covariance of R n , note that Lemma |3 . 1 1 1 allows to write 

n n n / \ 

Var„ {R n ) = £ Var„ (i? m ) + ( ||^ - Xj\\ < Cb ) Cov„ (R ln ,R jn ) , 



i=i 3=1 



and consider the first term in (A.5|. We have 

Var„ (R m ) < E n [R 2 m ] < E n 1 (X t e X ) (m m ~ m(X^) 6 E m [l 2 m ] 
with, using (A4), the Cauchy-Schwarz inequality and Lemma |3.7 ( |3.6."3 ), 



E m [lf n ] < CE in 



1 ^(3) ( gj - ^("T-m - m(Xj)) - 



■ c 



< Ch, 



K 



( 3 ) /e - f(m in - m(Xi)) 



bi 



dt 



f(e)de 



dt 



so that 



Var„ (i? m ) < C&!E n [l (X, e X ) (m in - m(X 2 )) 6 ] . 
Therefore form Lemma |3.1Q| we deduce 

71 

Y Var„ (R ln ) < Cnb x sup E„ [l (Xi G Afc) (m in - m(A 4 )) e 



< O r {nb 1 )[b* + 



(A.4) 



(A.5) 



(A.6) 
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For the second term in (A.5), the Cauchy-Schwarz inequality gives, with the help of the above 
result for Var„ (Ri n ), 

1/2 

|Cov n (Rin, Rjn)\ < (Vax„ (Rin) Var„ (R jn )) 

< Cbx sup E„ [1 (X t e X ) (m in - m(Xi)) e ] . 

l<i<n 

Hence by Lemma |3.10| and the Markov inequality, we have 

n n / \ 



1 = 1 3 = 1 



< 



0p(M ( 6 " + i) EE(ii^-^ii<^o) 

< P ( bl )[bt+^j 3 (nX)- 



This order, (A. 6) and (A.5) give, since nb^ diverges under (Ag), 

3 



Var (R n ) = Op lb] 



1 



(nXh) 



Finally, with the help of this result, (A.4) and the Tchebychev inequality, we arrive at 

3/2 



Rn — Or 

= o P 

Proof of Lemma 13.71 



3/2' 



(nb\ + (nH^bx) 1 



^ ' bt 



rib* 



3/2' 



.□ 



Set h p (e) = e p /(e), p € [0,2]. For the first inequality of (3.6.1), note that under (A$) and 
(Ag), the change of variable e = e + b\v give, for any integer I G [1, 3], 

Kf (^)\vf(e)de 



= bxj K^ivfhpie + b^dv 
< 6isup|Ap(t)| (\Kf\vf\dv 



ten 
< Cb u 



(A.7) 



which yields the first inequality in ( 3.6. 1[ ). For the second inequality in (3.6.1), observe that /(•) 
has a bounded continuous derivative under (A§), and that J K[ e \v)dv — under (Ag). Therefore, 
since h p (-) has bounded second order derivatives under (At), the Taylor inequality yields that 



= h 



K[ e \v) [h p (e + hv)-h p (e)} 



dv 



< bfsup\h p V(t)\ / \vK[ e) (v)\dv < Cbj. 
ten, ' 



which completes the proof of (3.6.1 1 
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The first inequalities of (3.6.2) and (3.6.3) follow directly from (A.7). The second bounds 



in (3.6.2) and (3.6.3) are proved simultaneously. For this, note that for any integer i G {2,3}, 

J K[ e) ( 6 ^- e ) h p (e)de = b, J {v)h p {e + b lV )dv. 

Under (Ag), K\(-) is symmetric, has a compact support and two continuous derivatives, with 
/ Kf\v)dv = and / vK[*\v)dv = for £ G {2, 3}. Hence, since by (A5) h p has bounded conti- 
nuous second order derivatives, this gives for some 9 = 0(e,b\v), 



h I K^^lhpie + b^-hp^dv 
2 



h / K[ l \v) 



dv 



I f v 2 K i 1 i \v)h p 2 \e + 9b 1 v)dv 



b\ 



^sup\h (2 \t)\ v 2 K[ e \v) dv <Cb\.U 
2 tea J 



Proof of Lemma 13.81 



Assumption (A^) and Lemma 3. 7 (3. 6.1) give 



Var r , 



(1) / ti 



Si - e 



(i) 



1=1 



£i - e 



E 



(1) 



e — e 



< 



i=l 



A" 



(1) 



e — e 



< Cn^ max |/3 in | , 
KKn 



< Cn6i max Ift. 

Ki<n 



Hence the (conditional) Markov inequality gives 



Eft«^i 



(i) 



i=l 



O p fn6^ + (n&i) 1 / 2 ) max 

V / Ki<n 



so that the lemma follows if we can prove that 



sup |/3 m | = P (6q) , 

Ki<n 



as established now. For this, define 



<,,(>■) = II (.r G .V„)(/„(A- ; ) - /,/(.,-)) AT,, ( 



(A.8) 



1 ™ 

' Vm{x) = ( - i)b d E (OW- E [OWD. 



and z/„(x) = E[Cj(x)]/6q, so that 



Pin — 



n - 1 v in (Xi) + ^(Xi) 



For maxKKn (^(-^i)!) first observe that a second-order Taylor expansion applied successively to 
g(-) and rn(-) give, for bo small enough, and for any x, z in X ', 



[m(a; + b z) — m(x)] g(x + b a z) 



b a m (1 \x)z + yzm (2) (i + (ib z)z T 



g(x) + b g {1) (x)z + -j-zg (2 \x + ( 2 boz)z T 
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for some £1 = Ci{x,boz) and £2 = C2(x, boz) in [0, 1]. Therefore, since JzK(z)dz = under (Aj), it 
follows that, by (Ax), (A 2 ) and (A 3 ), 



max \u„(Xi)\ < sup \v n {x)\ = sup 



(m(x + boz) — m(x)) Ko(z)g(x + boz)dz 



< Cbi 



(A.9) 



Consider now the term maxi<i<„ |i>j n (JQ)|. The Bernstein inequality (see e.g. Serfling (2002)) and 
(A4) give, for any t > 0, 



P (msx\v in (Xi)\ >t) < ^FQ Vin (Xi)\ > t) < [ 1 



\vin{x)\ >t\Xi = x)g(x)dx 



< 2nexp — - 



(n - l)i 2 



2sup aeA , Var(C J -^)/^) + ||^' 

where M is such that sup^g^ | Cj C^) I < M. The definition of Xq given in (^2), (-^3)1 (A7) and the 

standard Taylor expansion yield, for bo small enough, 

1 r Cb 2 

sup \Q(x)\ < Cb , sup Var(Cj(;r)/6o) < 72 SU P / ( m ( x + - m(a;)) 2 #0 (z)g{x+b z)dz < , 

so that, for any t > 0, 



P ( max \v m (Xi)\ >t)< 2nexp ( - 



This gives 



... , Un 111 ft 



1/2 \ / 

i < In exp 



(n-l)^ 2 /bg 
C + Ct/b 



t 2 hi? 



provided that t is large enough and under (Ag), It then follows that 

( \P" In n \ ^ 
maxJ. m (X0|=O P U^ . 



1/2 



This bound, (A.9) and Lemma 3.1 show that (A.8) is proved, since b\ In n/(nbg) = O under 
(A9), and that 



a _ n-1 V tn {Xj) + V„{Xj) 
Pin — ^ •'— ' 

n gin 



Proof of Lemma 13.91 



Note that (A4) gives that T, in is independent of £j, and that E n [Ej n ] = 0. This yields 



E, 



(1) / £i 



= 0. 



(A. 10) 



Moreover, observe that 



Var„ 



.1=1 
n 

X] Var " 



£i — e 



2 = 1 



y ■ k 



&1 

(1) / £i - e 
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i=l 3=1 



6l 



6l 



(A.ll) 
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For the sum of variances in (A.lll, Lemma 3.7 (3. 6.1 1 and (Aj) give 



E Var r , 



i=i 



y. K W 



where tr 2 = Var(e) and 



£i - e 



i=l 

< ^EE 



(l) f £i-e 
6i 



1(-X"j G An) 2 ^ Xj — Xj 
5 A o 



n& ~ i 

U 2—1 



'/r 



u .7 = 1,7^1 



For the sum of conditional covariances in (A. 11 ), observe that by (A4) we have 



(A.12) 



=1 j=i 



EE 

:1 

n 1 

EE E 



Aj 



(1) 



ft ~ e 
61 



(1) / e J 
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1 = 1 3 = 1 



EE 

i=i 3=1 



V- V- Jo 



6l 



(i) ( £i ~ M ^(i) f £ J ■ - e 



I(A, G An)I(X 3 G A 
(nb$) 2 g m g jn 



EE^ 

fc=i £=i 
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An 



Xi — Xj 
bo 



.(1) 



where 

= £fcA^ 

Moreover, under (A4), it is seen that for k ^ £, E[£fcj£^j] = when Card{i, j, fc, £} > 3. Therefore 
the symmetry of An yields that 



Sj-e 



E E Cov « 



i=i 3=1 

3Vi 



(l) / £i 



6l 



/<' 1 

1 ^jn 1 ^ 1 



(1) £ 3 



£j — e 
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EE 

s=i 3=1 

3#i 



1(A; G A )1(X, g A ) 2 (Xj - X 



QinQjn 



b 



S 2 



eA} 



(1) 



EE 

»=i 3=1 

3Vi 



I(X a G XqMX, G A ) 
(nb$) 2 g in g jn 



\ b Q 



A 



e — e 



Afe — Aj 



E[e 2 ]E 2 



A 



(i) 



E — e 



Therefore, since 

sup ^r- 

l<j<n \ \9jn\ 

by Lemma 3.1, Lemma 3. 7 (3. 6.1) and (A4) then give 

n n 

EE Cov 



1=1 3=1 



= Op 



^in Kl ( ^ ) 7 5]j„ A| 



Pp(l) 



(1) ( £ J- e 

bi 



bf\^ l(Xi G Xo)9i 



nbt . . 



E 



+ Or(bf)J2 



^ 1(A, G A )| 5i . 



i=l 



(A.13) 
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where gi n is defined as in (A.12) and 



The order of the first term in (A. 13) follows from Lemma 3.1 which gives 



Et(Xi G X )g in 
ps— i = Op(n). 



i=l 



I?/-, 



(A.14) 



Again, by Lemma |3.1[ we have 



E 1{Xi |^ )|gml = <MU El(^e * ) b 

i=l ' 5m ' i=l 

with, using the changes of variables x\ — xz + boZ%, x 2 = x% + 60^2, 



E 



E 1 ( x > e *b) Iff* 



i=l 



Cn 3 
Cn 3 



< 



n 2 /i 2 j ^3 
Cn 3 6 2d 



A3 — A2 



2:3 - x 2 



Y\_ g{x k )dx k 



k=l 



These bounds and the equality above, give under (A2) and (A7), 

^ IjXj e X )\g m \ 

h m p(n) - 



Hence from (A.14), (A. 13), (A.12), (A. 11) and Lemma 3.1 we deduce, for 61 small enough, 
Var„ 



E 



(i) ( e,-e 



= o P 



= o P 



t(Xi e X )g in _ / b\ \ ^ l(A j; e A!b)flf <B , _ ,. 4 . Al(^€ ,*b)|ifc 



E 



E 



o r (bt)j2 



\9i: 



K bt 



Finally, this order, (A. 10) and the Tchcbychev inequality give 



b d 



1/2 



Proof of Lemma 13.101 



Define (3i n as in Lemma 3.8 and set 
1 " 

3™ = ^ E *i 



,4 / Aj — A,; 



nb\ 



3 E *» 



2 / -^3 — 



j=i,j¥« 

The proof of the lemma is based on the following bound : 



b 



1 (A, e # ) (m m - to(A 4 ))* 



< C 



/3 + 



(n6g)(*/a)g* n 



/ce{4,6}. (A.15) 
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Indeed, taking successively k — 4 and k = 6 in (A. 15), we have, by (A.8), Lemma 3.1 and (Ag), 

2 



sup E r 

Ki<n 



sup E„ 

Ki<n 



1 (X, G Xo) (m m - m(A,)) 4 



1 (X t G Xo) (m in - m(Xt)y 



= ° r{b ^ + ^w) =0p { bt + ^J ' 



which gives the results of the Lemma. Hence it remains to prove ( |A.15 ). For this, define /3j n and 



Y>i n respectively as in Lemma |3.8| and Lemma 3.9 Since G Xq) (fhi n — m(Xi)) = /3j n + Si n , 

and that /?i„ depends only on (Ai, . . . , A„), this gives, for k G {4, 6} 



1(X«G A- ) (m m - m(A,)) A 



< C^„ + CE„ [Sf„] 



(A.16) 



The order of the second term of bound (A.16) is computed by applying Theorem 2 in Whittle 



(1960) or the Marcinkiewicz-Zygmund inequality (see e.g Chow and Teicher, 2003, p. 386). These 
inequalities show that for linear form L = a jCj with independent mean- zero random variables 

Ci, . . . , Cn, h holds that, for any k > 1, 

k/2 



E \L k \ < C{k) 



E^ 2/fc lc 

i=i 



where C(fc) is a positive real depending only on k. Now, observe that for any i G [ljTi], 

v _ — 1 (Xj G Xq) ( ' Xj — Xi 



Since under (A4), the <7jj„'s, j G [1, n], are centered independent variables given X\, . . . , A„, this 
yields, for any k G {4, 6}, 

k/2 



E„ [Sf„] < CE [s k ] 



1 (Ai G Xq) r ^2 ( ^"i — -^i 



3=1 



Hence this bound and (|A.16|) give 
E„ 



t(Xi € X ) (m in - m(Xi)y 



b 



< C 



< 



CI (A, G Xq) g. 
(nb^)( k / 2 )g. 



k/2 



k , t(x t eXo)g:, 

Pin 



k/2 



{nbi)W)gl 



which proves (A.15), and then completes the proof of the lemma. 



□ 



Proof of Lemma 13.111 

Since -Ko(') has a compact support under (A7), there is a C > such that ||Xj — Xj\\ > Cbo 
implies that for any integer number k of [l,n], Ko((Xk — Xi)/bo) = if Ko((Xj — Xk)/bo) ^ 0. Let 
Dj C [1, n] be such that an integer number k of [1, n] is in Dj if and only if Ko((Xj — Xk)/bo) 7^ 0. 
Abbreviate P(-|Ai, . . . , X n ) into P„ and assume that || Xi — Xj \\ > Cbo so that Di and Dj have an 
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empty intersection. Note also that taking C large enough ensures that i is not in Dj and j is not 
in Di. It then follows, under (A4) and since Di and Dj only depend upon Xi, . . . , X n , 



£i \ eA 



{fh in - m(Xi),Si) G A and (%„ - m(Xj),Sj) G B 

/ / E fcggAW M^fc) - mpfi) + e k ) K Q ((Xk - X^/bp) 



and I =^ — — 7t~tz ; ,. — ; , £j I € is 



ZeeD d \{j} K ° ^ X i ~ X o)l h ^>) 
(( Eke Di \{i} M*fc) - m (Xj) + e fc ) K {{X k - Xi)/bo) \ \ 

{{ ZkzDMiyKoUXk-xj/bo) > £l ) e ) 

X n {{ ZeeDMiyKodXe-X^/bo) 
((m m - m{X i ),e i ) £ A) x P„ ((%„ - m{X j \e j ) G B) . 



£j G £ 



This gives the result of Lemma 3.11 since both (fh in — m{X i ) 1 £ i ) and (fhj n — m(Xj),ej) are 
independent given Xi , . . . , X n . □ 



Proof of Lemma 13.121 

Since fhi n — m(Xi) depends only upon (X\, . . . , X n , e k , k =£i), we have 



E Var » it™) < E E » [cS»] = E E « 



l(XiG A'o)(m in -m(X i )) 4 E i 



(2) / e< - e 
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with, using Lemma 3.7 (3.6.2), 



E, 



(2) / £j - e 



Therefore these bounds and Lemma 3.10 



give 



E Var «(Cm) < C6x^E r 



1 (X, G *„) (m m - m(^)) 4 



< Cn6i sup E„ 

KKn 



1 (X, G * ) (m m - mpQ)) 4 
1 ^ 2 



which yields the desired result for the conditional variance. 

We now prepare to compute the order of the conditional covariance. To that aim, observe 
gives 



that Lemma 3.11 



i=l 3=1 



i=i j=i 



E CoV « (Cin, C,n) =EE 1 (ll^-^ll< ) ( E » ~ En [Cin] E„ [C. 
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The order of the term above is derived from the following equalities : 



EE 1 !^ - Xj\\ < Cb )E n [0„] E n [Q n ] 

n n / \ 
J2 E 1 ( - X i II < C6 « ) E » [CinCin] 



P (n 2 6^i) & 4 



(A.17) 



i=l 3=1 

35« 



P (nX^ /2 )(^ + ^) 2 . (A.18) 



Indeed, since 6i goes to under (Axo), (A.17I and ( A.18 1 yield that 

2 



EE C ° V " (CiniCjn) = Op 



1=1 i = i 



which gives the result for the conditional covariance. Hence, it remains to prove (A.17) and (A.18). 



For (A.17), note that by (Ai) and Lemma 3.7 (3. 6. 2), we have 
|E„[Ci„]| = E„ 1 (Xi e X ) (m in - m(X i )) 2 E J 



(2) / Si 



£;, - e 



< Cb\ E„ 1 {Xi e X ) (m m - m(Xi)) 4 



1/2 



Hence from this bound and Lemma 13.101 we deduce 



sup |E„[Ci„]E n [( jn ]\ < Cb\ sup E n 

1 < i , j < n 1 < i < n 



< Op (6?) ( 6* + 
Therefore, since the Markov inequality gives 

n n / 

EE 1 ! H Xi - x ^ < Cb ° ) = ^X), 



1 (Xi e Ab) (m in - m(A^)) 4 



i=l 3=1 

3V» 



(A. 19) 



it then follows that 



E E 1 (ll^ - ^11 < Cfe o) E ™ E » [On] = Op {n 2 b d bt) (b 4 + 



which proves (A.17). 



For (A.18), set Z;„ = 1 (X; € Xq) (rhi n — m(JQ)) , and note that for i ^ j, we have 

E777, Zj n K-^ 



En [CinCjn] 



(2) / £j 



£1 — e 
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(2) / e t 



Ei - e 



61 



(A.20) 



where 



(2) / £i 



e,: - e 



= /9l„E 



&i 



v - 



(2) 



£» - e 
~&i~ 



■Ei 



y2 v-CO f gj ~ e 



(A.21) 
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The first term of Equality (A. 21) is treated by using Lemma 3.7 -(3.6.2). This gives 

,(2) (si-e 



K 



:\ ,32 



(A.22) 



Since under (A4), the Sj's are independent centered variables, and are independent of the Xj's, the 



second term in (A. 21) gives 
E 



y - k v 



(2) ( £i - e 



I (Xj G A'q) ^ / A fc - Aj _. 



fc=i,fe^j 



(2) 



61 



(2) / £• 



e, - e 
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Therefore, by (A 7 ) which ensures that K is bounded, the equality above and Lemma 3.7 (3.6.2) 
yield that 

1 {X.j G # ) 



(2) / £i - e 



< Cbi 



lb o9j 



For the last term in (|A.21|), we have 
Ei 



bi 



^ n n 



fc=i g=i 



Xk — Xj \ ( X f — Xj 



1 



E ^ 



&0 



As. — A, 



b 



E, 



(2) 
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2 / , E 
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4^ 2) ( ^ 



with, using Lemma 3.7 (3.6.2), 

E m 



< max ^ sup 

< Cb\ 



E, 



, E[e 2 ] sup 



E, 



(2) 



£ — e 



(A.23) 



Therefore 



V2 K (2) f 



< 



E *2 



2 / 



At. — A-; 



Substituting this bound, (|A.23|) and (|A.22|) in (|A.21|), we obtain 

< Cb\M n , 



E; 



Zj n K l 



(2) 



et - e 



bi 



where 



M n = sup 

l<j<n 



3" 



1 (Xj G *„) 



nb%g jn 



(nb$g jn ) 2 k= f 



E ^ 



2 ( X k - Xj 
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Hence from (A. 20), the Cauchy-Schwarz inequality, Lemma 3.10 and Lemma 3.7 (3.6.2 ) , we deduce 



i=l j=l ^ ' 
n n * 

< cM n b\ x X) 1 ( w x i - x j\\ < Cb ° ) E « 



i=l i = i 
3#i 



(2) 



*1 



< CM n b\ £ E 1 ( H X * - X iH < C6 ° K /2 ^ ^ 

t=l 3=1 ^ ' 

\ / ^—^ j = i \ 



(2) / 



6i 



Moreover, (A. 8) and Lemma 3.1 give, under (Ai), {Aj) and (A$), 



M„=(>. [b*< b " 1 ' 



nos nt 



Finally, substituting this order in the bound above, and using (A.19), we arrive at 
n n , v / -. x 2 

EE 1 ^!!^-^!! < C& ojE« [CinOn] = O p (nXb\' 2 ) [ h t + - b 

i—1 3=1 ^ ' \ ' 



This proves (A. 18), and then completes the proof of the theorem. 



□ 



Chapitre 4 



An integral nonparametric kernel 
estimator of the probability 
density function of regression 
errors 



Abstract : This chapter is devoted to the nonparametric density estimation of the regres- 
sion error using an integral method. The difference between the feasible estimator which uses the 
estimated regression function and the unfeasible one using the true regression function is investi- 
gated. An optimal choice of the first-step bandwidth used for estimating this regression function is 
proposed. We also study the asymptotic normality of the feasible integral kernel estimator and its 
rate-optimality. 

4.1 Introduction 

Consider a sample (X, Y), (X\, Y\), . . . , (X n ,Y n ) of independent and identically distributed 
(i.i.d) random variables, where Y is the univariate dependent variable and the covariate X is of 
dimension d. Let m(-) be the conditional expectation of Y given X and let e be the related regression 
error term, so that the regression error model is 



The aim of this chapter is to estimate the p.d.f of the regression error under the assumption that 
the covariate X and the regression error e are independent. Indeed, under this assumption, we have 



Yi = m(Xi) + Si, i = l,... 



n. 



(4.1.1) 



/(e) = f(e\x) = V ( m ( x ) + A x ) ■ 



(4.1.2) 



4.2 Presentation of the estimators 
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Hence, the approach proposed here is based on a two-steps procedure, which, in a first step, uses 



(4.1.2) and writes /(e) in the integral form 

/(e) = / 1 (x € X) (p (e + m(x) \ x) g(x)dx = I 1 [x € X) <p (x, e + m(x)) dx. 



where X is the support of the p.d.f g(-) of X, and ip(-, •) the joint density of (X, Y). This formula 
suggests to estimate /(e), in a second-step, by 



/2n(e) = J t(x £ X)fi n (x,e + m n (x)) dx, 



where (p and fh n define respectively some nonparametric estimators of tp and m. As in Chapter 
2, a challenging issue is first to evaluate the impact of the estimated regression function on the 
final estimator of /(•). Next, an optimal choice of the bandwidth used to estimate the residuals is 
proposed. Finally, we study the asymptotic normality of the estimator /271(e) and its rate-optimality. 

The rest of this chapter is organized as follows. Section 4.2 is devoted to presentation of 
ours estimators. Sections 4.3 and 4.4 group our assumptions and main results. The conclusion of 
this paper is given in Section 4.5, while the proofs of our results are gathered in section 4.6 and in 
two appendixes. 

4.2 Presentation of the estimators 

In what follows, the bandwidths bo and b\ are associated with X and h with Y , and Kq, K\ 
and Ki represent some Kernels functions. Then for (x,y) £ M. d x K, the nonparametric estimators 
of ip(x,y) and g(x) are respectively defined as 



i n 



U 2—1 X 



Xi - x 



The estimation of the regression function m(-) is given by the Nadaraya- Watson estimator (1964) 

fM*) = g^MM . (4.2.3) 

Since Y — m(X) + e, we have 

P (e < e | X = x) = P (Y < e + m(x) \ X = x) . 

Then if / represents the probability density function of e, and ip the joint density of (X, Y), it 
follows 

/(e) = / 1 (x £ X) (p (e + m(x)\x) g{x)dx = / 1 (x € X) <p> (x, e + m(x)) dx, (4.2.4) 



4.3 Assumptions 



60 



where X is the support of the p.d.f g of the covariates. Therefore an estimator of /(e) is the so-called 
"Two-steps estimator" , defined as 



This estimator is a feasible estimator in the sense that it does not depend on any unknown quantity, 
as desirable in practice. This contrasts with the unfeasible ideal Kernel estimator 



which depends in particular on the unknown regression function m(-). It is however intuitively 
clear that f 2n (e) and /271(e) should be closed, as illustrated by the results of the next section. 

4.3 Assumptions 

(Hi) The support X of X is a known compact subset ofR d , 

(H2) the p.d.f. g(-) of the i.i.d. covariates X,Xj has continuous second order partial derivatives 
over X. Moreover, there exists a > such that g(x) > a for all x in the support X, 

(H 3 ) the regression function m(-) has continuous second order partial derivatives over X , 

(H 4 ) the i.i.d. centered error regression terms e,Ei 's, have finite 6th moments, and are independent 
of the covariates X, X; 's, 

(H 5 ) the probability density function f of e has bounded continuous second order derivatives over 
R, and satisfies, for h. p {e) — e p f(e), sup egR \hf^\e)\ < 00, p G [0,6], k G [0,2], 

(H 6 ) the density tp of (X, Y) has bounded continuous second order partial derivatives over R d x R, 

(H7) the Kernel functions Kq and K\ are symmetric, continuous overR d with support in [—1/2, 1/2] 
and jK (z)dz — 1, fKi(z)dz = 1, 

(H 8 ) the Kernel function K 2 has a compact support, is three times continuously differ entiable over 
R, and satisfies § K 2 (v)dv = 1, §vK 2 (v)dv = and f \v p K^ (v)\dv < 00 for p,£ in [0,3], 

(H 9 ) the bandwidth b decreases to and satisfies ln(l/6o)/ln(lnn) — > 00 and b d /(nbQ d ) p = 0(6q P ), 
p G [0, 6], when n — > 00, 

(H10) the bandwidths bi and h decrease to and are such that nb\ d -> 00 and n (d+») h 7(d+4) 

— > 00 

when n — > 00. 

Assumptions (H 2 ), (#3), (#5) and (H 6 ) impose that all the functions to be estimated nonpara- 
metrically have two bounded derivatives. Consequently the conditions J vKj(v)dv = 0, j = 0, 1, 2, 
as assumed in (H7) and (H s ), represent standard conditions ensuring that the bias of the resulting 




(4.2.5) 




(4.2.6) 



4.4 Main results 
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nonparametric estimators (4.2.3) and (4.2.6) are respectively of order 6q and 6q + /i 2 . Assumption 
(H4) states independence between the regression error terms and the covariates, which is the main 



condition for (4.1.2) to hold. The differentiability of K2 imposed in (Hg) is more specific to our 
two-steps estimation method. Assumption (ifg) is used to expand the two-steps Kernel estimator 



fan in (4.2.5) around the unfeasible one fa n from (4.2.6), using the derivatives of K% up to third 



order and the differences fhi n (x) — m(x), i <E [l,n], where fhi n (x) is a leave-one out version of the 



Kernel regression estimator (4.2.3) 



fh in {x) 



bo 



X,-a 



(4.3.7) 



Assumption (ifg) is a standard condition to obtain uniform convergence of the regression estimator 



m n in (4.2.3 ) (see for instance Einmahl and Mason, 2005), and also gives a similar consistency result 
for the leave-one-out estimator fhi n . Assumption (if 10) is needed in the study of the difference 
between the feasible estimator fa n and the unfeasible estimator fa n - 

4.4 Main results 

Our first main result establishes the order of the difference fa n ( e ) ~ f( 6 )- This is given in 
the following subsection. Next, we shall give the optimal bandwidths needed to estimate /(e). We 
conclude this section by proposing an asymptotic normality of the estimator fa n (c)- 

4.4.1 Pointwise weak consistency 

In this subsection we deal the order of the difference fan( e ) ~ f( £ )- We show that for n 
large enough, the estimator fan( e ) is very close to the theoretical density /(e), as illustrated by the 
following result. 

Theoreme 4.1. Suppose that Assumptions (ifi) — (ffio) hold. Then for n large enough, we have 

1/2 



where 



and 



fan(e) - /(e) = O p \AMSE(b!,h) + RT n (b Q , b u h) 
(/ 2 „(e)-/(e) 



AMSE{bi,h) =E ri 



P I bf + h 4 



nbi J ' 



4.4 Main results 



62 



The result of Theorem 4.1 is based on the evaluation of the difference between /2n( e ) an d /2n(e)- 
This evaluation gives an indication about the impact of the estimation of m(-) on the nonparametric 
estimation of the regression error density. 

4.4.2 Optimal first-step and second-step bandwidths for the pointwise 
weak consistency 

Our next result deals with the choice of the optimal bandwidth &o used in the nonparametric 
estimation of the p.d.f of the regression error term. We have the following theorem. 

Theorem 4.2. Suppose that Assumptions (Hi) — (H w ) are satisfied, and assume 6q = Define 

K = K( h ) = argmini?T„(6 ,6o,/i), 

bo 

where the minimization is performed over bandwidth bo fulfilling (Hg). Then the optimal bandwidth 
6g satisfies 



6q x max 



1 \ d + 4 / 1 



h 3 ' V n 3 h 7 



and we have 



RT n (b%, &q, h) x — h max ■ 
n 



2 h 3 ' V n 3 h 7 



The next theorem gives the conditions for which the estimator /2n(e) reaches the optimal 
rate n~ 2 / 5 when bo takes the value 6q- We prove that for d < 2, the bandwidth that minimizes 
the term AMSElb^, h) + RT n (bg, M nas the same order as n -1 / 5 , leading to the optimal order 
n -2/5 for the term (AMSE(b* ,h) + RT n (b* n ,b* , h)) 1/2 . 

Theorem 4.3. Assume that (Hi) — (-Hio) hold and set 

h* = argminf AMSE(b^,h) + RT n (b* , b* , h) 



where b^ = b^(h) is defined as in Theorem 4-2 Then 



1. For d < 2, the optimal bandwidth h* satisfies 

h* x 

and we have 



\AMSE(bZ,h*) + RT n (b* ,h*,h*)Y x 

2. For d > 3, h* satisfies 

. 3 



and we have 



\AMSE(b* ,h*) + RT n (b*,b*,h*)y x Q ' 
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Theorem 4.3 follows from Theorem 4.2 which reveals that for b\ proportional to n 1 ' 5 , the band- 



width 6q has the same order as 



max 



I \ 5(d+4) / 1 \ 5(2d + 4) { 1\ 5 < 2d + 4 ' 

n ) \n ) | In. 



For d < 2, this order of 6q is l ess than the one of the optimal bandwidth bo obtained for pointwise 
or mean square estimation of m(-) using a nonparametric Kernel estimator. In fact, as seen in 
Chapter 3, the optimal bandwidth bo for estimating m(-) is obtained by minimizing the order of 
the risk function 



r n {bo) = E 



1 (x G X) (m n (x) — m(x)) 2 g^(x)w(x)dx 



which has the same order as fc 4 , + (l/(n6p)), leading to the optimal bandwidth bo = rr 1 /^ 4 ^. For 
d=l, the optimal order of 6q is n"^/ 5 -*^ 4 / 3 - 1 which goes to slightly faster than n -1 / 5 , the optimal 
order of the bandwidth for the mean square nonparametric estimation of m(-). For d = 2, the 
optimal order of 6Jj is n -1 / 5 . Again this order goes to faster than the order nT 1 ^ of the optimal 
bandwidth for the nonparametric estimation of the regression function with two covariates. But 
for d > 3, we note that the order of 6q goes to slowly than bo. Hence these sitauations reveal 
that the optimal m n (-) for estimating /(•) should have a lower bias and a higher variance than 
the optimal Kernel regression estimator of to(-). This situation is the same as the one noticed in 
Wang, Cai, Brown and Levine (2008) for the estimation of the conditional variance function in a 
heteroscedastic regression model. However these authors do not investigate the order of the optimal 
bandwidth to be used for estimating the regression function in their heteroscedastic setup. Hence, 
as in Chapter 3, we conclude that an estimator of m(-) with smaller bias should be preferred in our 
framework, compared to the case where the regression function m(-) is the parameter of interest. 

4.4.3 Asymptotic normality 

The aim of this subsection is to propose an asymptotic normality of the estimator ji n (e) . 
We have the following result. 

Theorem 4.4. Suppose that bo = b\ and assume 

(Hn) : nb^+ 4 = 0(1), nb^h = o(l), nb^h 3 -> oo, 

when n — > oo. Then under (Hi) — (Hiq), we have 

^n~h (/ 2 „(e) - / 2 „(e)) 4 N (o, /(e) J K 2 2 (v)dv^j , 



whe 



f 2n (e) = m + b jjt(xeX) 9Mx,e + m(x)) dx J 

+y / 1 (* e X) 92(P % + y ^ dx J v*K 2 {v)dv + o (tg + h*) 
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As seen in the comments of Theorem |3.4| in Chapter 3, we can check that for d = 1, h = h* and 
bi = bo = 6g, the conditions of Assumption (Hn) are realizable with the bandwidths and h*. 
But with these bandwidths, the last constraint of (Hn) is not satisfied for d = 2, since for bo = 
and h = h* , nbgh 3 is bounded when n goes to infinity. 
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4.5 Conclusion 

In this chapter, we investigated the nonparametric Kernel estimation of the p.d.f of the 
regression error using an integral method. The difference between the feasible estimator which uses 
the estimated regression function and the unfeasible one using the theoretical regression function 
is studied. An optimal choice of the first-step bandwidth used to estimate the regression function 
is also established. Again, an asymptotic normality of the feasible Kernel estimator and its rate- 
optimality are proposed. As in Chapter 2, the contributions of the present chapter is the analysis of 
the influence of the estimated regression function on the regression errors p.d.f. Kernel estimator. 

The strategy used here strategy is to use an approach based on a two-steps procedure 



which, in a first step, integrates a conditional p.d.f as given in (4.2.4 1. In a second step, we build 



the Kernel estimator of /(e) by estimating nonparametrically the unknown functions in the integral 



terms of (4.2.4 1. If this strategy can avoid the curse of dimensionality, a main aspect of our setup 
is to evaluate the impact of the estimation of m(-) on the final integral Kernel estimator of /(•) 
in the first nonparametric step, and to determine the optimal choice of the first-step bandwidth 
bo. For a such choice of bo, our results suggests that the optimal bandwidth to be used should be 
smaller than the optimal bandwidth for the mean square estimation of rn(-). This mean that the 
best choice for bo is the one such that the estimator m„(-) of the regression has a lower bias and 
a higher variance than the optimal Kernel regression of the estimation setup. With this choice of 
&o, we show that for d < 2, the estimator /2n(e) of /(e) can reach the optimal rate rt~ 2 / 5 , which 
corresponds exactly to the rate reached for the Kernel density estimator of an univariate variable. 
This reveals that for d < 2, the integral Kernel estimator /271(e) is not affected by the curse of 
dimensionality, since there is not a negative influence caused by the estimation of the optimal 
first-step bandwidth 6q- 
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4.6 Proofs section 
Proof of Theorem 14.11 

The proof is a consequence of the two followings lemmas. 
Lemma 4.1. Under (Hi) — (Hiq), we have, when n goes to infinity, 



/2n(e) - / 2 n(e) 



+o P 
+o P 



1 



1 



(6g V bf) 



h 2 



nb° r 



bfh 5 \° ntfij n% d h 3 



-,1/2 



1/2 



nb a n 



1/2 



Lemma 4.2. If (Hi) — (i?io) Ao/d, £/ien 



/ 2 „(e) - /(e) = P 6? + ^ 4 + 



n/i 



1/2 



Let now turn to the proof of Theorem |4.1| Using Lemmas |4.2| and |4.1[ we have 



1/2 



/ 2 „(e) - /(e) = (/ 2 „(e) - /(e)) + / 2 „(e) - / 2 „(e) 



P [b\ + h 4 + 



(b d o V bf] 



nh 



1/2 



Op 



+Op 



6j 



which yields the result of the Theorem. 



We now prove Lemmas |4.1| and |4.2 



Proof of Lemma 14.11 



Let us introduce additional notations. Let rhi n (x) be as in (3.2.4) and define 



S n (x) 
T n (x) 



i!fi£ (x)-m(x))K 1 (_L_? W 



Xj-aA (i) (Yj-e - m(x) 



1 i— l 
1 - 



nbfh 3 

1 z=l 



- „(2) / *j ~ e - m(z) 
* 2 I ft 



The proof of Lemma 4.1 is based on the following results. 
Lemma 4.3. Define 



S n = J 1 (x € A") S n (x)dx, T n — J t(x £ X) T n (x)dx. 



□ 
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Then under (Hi) — (.ffio), we have 



Ov 



T n — Op 



-,1/2 



nbfh 5 



2 &2 



nb$ n 2 b 2 d h 3 



1/2 



Lemma 4.4. Define 

R n (x) = ^ £ (^W*) - m(x)) 3 ^ (^p) ^(1 - ufK^ (^^] d«, 
w/iere in (x, u) = e — m(x) — u (rh in (x) — m(x)), and set 

R n = j 1 (x e A") R n (x)dx. 

If (Hi) - (ffio) ftota, t/ien 



i?„ — Op 



1/2 



- x\ f 1 (i) / is-Mm)' 



Lemma 4.5. Sef 

p - (a;) = ^ § ( ™- (a;) - ^- (a;)) ^ (^&r0 i ^ v * 

w/iere 9i n (x,t) = e + m in (x) +t(fh n (x) — m in (x)), and define 

P n = J 1 (x e X)P n (x)dx. 

Then under (Hi) — (H w ), we have 



dt, 



Pn = Op 



l , b* v bj \ 1/2 



The proofs of these Lemmas are stated in Appendix B. 
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Let us now return to the proof of Lemma 4.1 Observe that 

(p n (x, e + fh n (x)) - (p n (x, e + m(x)) 
' Xi - x 



nbfh . 

1 l—l 



A, 



Yi — e— fh n {x) 



A, 



Yi — e — m(x) 



(4.6.1) 



where 



Kn 



Yi - e- fh n (x) 



h 



-Kn 



Yi — e — m(x) 



Kn 



Kn 



Yj-e - fhj n {x) 
h 

Yi — e — m n (x) 



-K 2 
-K 2 



Yi — e — m(x) 



Yi - e - m in (x) 



(4.6.2) 



Since A 2 is three times continuously diffcrentiable under (H & ), the Taylor's theorem with the 
integral remainder gives 



Kn 



Yi — e — m in (x) 



Kn 



Yi — e — m{x) 



-\{m in {x)-m{x))KP f ^l^M 
+ ~(mnW-rn(x)fKpf Y *- e - m W 



1 

2h? 



(m in (x) - m(x)) / (1 - u) A. 



2^(3) fYi-e- m(x) - u(m in {x) - m(x)) 



du. 

(4.6.3) 



Again, under (Hg), we have 
'Yi — e— m n (x) 



Kn 



h 



-Kn 



Yi - e - fh in {x) 



(ffln(i) - fh in {x)) I K, 



1 „(i) (Yj-e - m m (x) - t (m n (x) - m in {x)) 



dt. 



Hence defining S n (x), T n (x), R n (x) and P n {x) respectively as in Lemmas 4.3 4.4 and 4.5 the 
equality above, (4.6.3), ( |4.6.2 | and (4.6.1) give 

tp n (x, e + fh n (x)) - n (x, e + m(x)) = -S n (x) + Tn ^ _ Ii lA x l _ p n (x), 



so that 



hn{e) ~ f2n{e) = ~S„ + ~ - — - P n 



+o P 



+ (b d wbp 



1 



1 



2 .4 



nb<th 5 \ nbi nbi n 2 b¥h? 



1/2 



b^\/bf 



1/2 
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Moreover, since under (Hg) bo goes to and that b d / ('nPb^' 9 ) = O(b lp ), this gives for p = 1, 



Tib* 



o 



nb^ ) ii 



Hence it follows that 

/2n(e) - / 2n (e) 



T 



&2 



-Op 



(6gv6f) 
(6gv6f) ( 



n&f/i 3 1 



nl/2 



1 



6f /i 5 V ° nb d 



n 2 b 2 d h 3 



1/2 



b^Vbf 



1/2 



which ends the proof of the Lemma. 



□ 



Proof of Lemma 14.21 

Observe that 



/ 2 „(e) - /(e) = f 2n (e) - E/ 2n (e) + E/ 2n (e) - /(e) 



(4.6.4) 



For the first term in (4.6.4), the independence of the (Xi,Yi)'s gives 

2" 

E ' 



(/ 2 „(e)-E/ 2n (e)J = Var(/ 2n (e) 
1 " r 



Var 



^&HM — h — Ux 



(nbf 



1 / 

w*% Ya *\J HxeX)K - 

n r 

< , * Vie t(xex)K 



, ,XLZ*)K 2 ( Yi - £ 7 m{x) \dx 



b x 



bi 



h 



Moreover, note that by (Hi), (H3) and (H?) — (H%), the changes of variables x = x x + hz x , 
y x = e-\-m(x x + b x z x ) + hv x and the Cauchy-Schwarz inequality give, since tp(-, •) is bounded under 
Assumption (He), 



' — x\ fYi — e — m(x) 
1.1 I — , K 2 I ; I dx 



E [ t(xeX)K_ 

7 = 1 

= „/" d Xl [\[t(xGX)K 1 ( X ^-) K J yi - e - m ^ 

JR d JR U \ "I / 



fi.T 



ip(x x ,y x )dy x 



[ dxi [ V} [ l(x x + b x z x eX) K x (z x )K 2 

JR d Jl L J 



j/i - e - m(xi + Mi) 
ft 



dzi 



<p(x x ,y x )dy x 



< Cnb\ d h j dz x Kf(z x ) [ t(x x +b x z x € X)dx x [ K%(v x )dv x . 

JR d J JR 

Hence from the two bounds above and the Tchebychev inequality, we deduce 



/ 2 n(e)-E/ 2n (e) = O r 



nh 



1/2 



(4.6.5) 
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We now compute the order of the second term in (4.6.4). Observe that 
E/ 2n (e) =E 

no7n * — ' / 
1 i—i 

'X 1 - x 



Xi - x\ (Yi — e — m(x) 

2 



h 



dx 



Yx — e — m(x) 



dx 



ft(xeX) [ dzi [ Jfi(*i 



)K 2 (vi)(p (x + bizi.e + m{x) + hv\) dv\ 



dx. (4.6.6) 



By (Hq), a second-order Taylor expansion yields, for z\ and v\ in the supports of K\ and and 
h and b\ small enough, 

dcp(x,e + m(x)) T dip(x, e + m(x)) 



if (x + bxZx, e + m(x) + hvi) = (p(x, e + m(x)) + bi 



dx 



-z{ + ti- 



dy 



b\ d 2 ip(x + ObiZi,e + m(x) + 9biVi) T 
d 2 ip(x + 9biZi,e + m(x) + 9biVi) T 



+bihv\- 



dxdy 



h 2 d 2 ip(x + 8biZi,e + m(x) + 9biVi) 2 

for some 9 = 6(x,e,b\Z\,hv\) in [0,1]. This gives, since jKi(z)dz — jK2(v)dv — 1, J zK\(z)dz 
and that jvK2(v)dv vanishes under {H7) — (Hs), 



dzi / Ki(zi)K2(v\)ip (x + b\Zi, e + m(x) + hv±) dv\ 
Jr 

. , 6? d 2 ip(x, e + mix)) f . , t h 2 d 2 ip(x, e + m(x)) f 2 , 

-<p(x,e + m(x)) - — v / «Ko(«)z' d«- — v / v I K 1 (v)dv 



2 



' 2 



2 <9 2 x 

d 2 <f(x + 9h a z, e + m(x) + 6h\v) d 2 ip(x, e + m{x)) 

d 2 x d 2 x 

d 2 tp(x + 9biz, e + m(x) + 9b\v) d 2 <p(x, e + m(x)) 

dxdy dxdy 

d 2 f(x + 0biz, e + 771 (x) + 9b\v) d 2 ip(x, e + m{x)) 



d 2 y 



d 2 y 



d 2 y 

z^ K\(z)K-i{v)dzdv 

z T K\ [z)Ki (v)dzdv 
v 2 Ki(z)K2(v)dzdv. 



Hence by the Lebesgue Dominated Convergence Theorem, we have, using (4.6.6) and (4.2.4), 

d 2 (p(x, e + m(x)) 



E/ 2n (e) 



1 (x G X) 



d 2 x 



dx I zKi(z)z T dz 



2 



J l(xG X) 



d 2 ip(x, e + m(x)) 
d 2 y 



dx / v 2 K2(v)dv 



1 (x G X) ip (x, e + m(x)) dx + o (bj + h 2 ) 
f(e) + o{b 2 + h 2 ), 



(4.6.7) 



so that 



Ef 2n (e)-f(e) = 0(b 2 + h 2 ). 
Finally, combining this result with (4.6.5) and ( |4.6.4[ ), we arrive at 



f2n(e)-f(e)=O r (bi + h 4 + 



nh 



1/2 



.□ 
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Proof of Theorem 14.21 

Observe that 



RT n (b ,b ,h) = b 4 +—(b 4 +- u )+- + —(b 4 + 



1 


(bt + 






nh 3 






' n 


1 


1 






b d h 3 


+ P 




nbl) 



1 f,A 1 ^ 



III 



3 bi f ^ 3 



h 7 \° nb d 



and note that 

i ( i i ■ 

Y \ d+4 / ^ \ d+4 / 1 \ 2d+4 



n 2 h 3 ) I \n 2 h 3 J ' \n 3 h 7 1 

if and only if n 4 ~ d h d+16 — ► oo. To find the order of &q, we shall deal with the cases n&Q +4 — > oo 
and nb d+4 = 0(1). 

First assume that nb d+4 — > cxd. More precisely, we suppose that &o i s m [(un/nj^^.cx)), where 
u n — > oo. Since 1/(ti6q) = 0(6q) for all these 6 > we have 



1,4 



Hence the order of is computed by minimizing the function 

Since this function is increasing with &0j the minimum of RT n (-, •, /i) is achieved for &o* = ( u n/ n ) 1 +4 ■ 
We shall show later on that this choice of 6o* is irrelevant compared to the one arising when 
nb d+4 = 0(l). 

Consider now the case nb^ +4 = 0(1) i.e b 4 = O (l/(n&{f)). This gives, since n6g d diverges 
under (H 9 ), using ^/(nfig*)* = OQff), p = 2, 

nh 3 ( 6 " + ^) ~ ^d^' + = ° ( n 362^7 

P ( 6 ° + n&f ) = ° i^bjh 3 ) and # ( 5 " + ribj) ~ (r^6fV 
Moreover if nb^jh 4 ~ > oo, we have 

1 =o / 1 x - 1 1 



n 3 b 2 l d h' 7 \n^t}^n o J ~ n^Og/i n 

Hence in this case, the order of is obtained by finding the minimum of the function 6g + 
(l/n 2 &o/i 3 ) + (V n )- The minimization of this function gives a solution bo such that 

/ 1 \ 3T3 1 / 1 \ ^ 

This value satisfies the constraints nb d+4 = O(l) and nb^h 4 — > oo when n 4 ~ d h d+le —> oo. 
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If now nb d+4 = 0(1) but nbfih 4 = 0(1), we have 

r^ = °(n4^)' RTnib0,b0 ' h) = bt + ^^ + n- 
In this case, the order of 6g is achieved by minimizing the function 6q + (l/n 3 &Q d /i 7 ) + (l/ n )i for 
which the solution &o verifies 

1 4 
/ I \ 2d+4 ^ / 1 \ 2d + 4 

fo o~ . RT n (b ,b ,h) ' ■ 



^n 3 h 7 J ' ' n \n 3 h7 / 

This solution fulfills the constraint nb d h 4 — 0(1) when n 4 ~ d h d+16 = O(l). Hence we can conclude 
that for 6q = O (1/(ti&q)), the bandwidth satisfies 



&q x max 



^ \ ci + 4 / 1 \ 2d + 4 



2 b? ' V n 3 /i 7 



which leads to 

1 (/l\ 3 T4-/l\ 23T4 

i?T„ (6g, 65, ft) x - + max j j , 

We need now to compare the solution 6g to the candidate &o* = (u n /n) 1 ^ d+4 ' > obtained when 
n6g +4 — > oo. For this, we must do a comparison between the orders of RT n (&£ , &o , ft) and i?T„ (6 * , &o* ? 
Since i?T n (6 , &o, ft) > 6q, we have i?T„ (6q* , ^0* , ft) > (u n /n) 4 ^ d+4 \ so that, for n large enough, 



RT n (b* 07 b* ,h) < c 



RT n {bo* } 6o*5 ft) 



1 \ d + 4 / 1 \ 2d + 4 



n 

4(d+8) 



/ 1 \ d + 4 / ] \ (2d+4)(d+4) 

= (l) + - I-ot) =o(i), 

\ a n/ \nh d + 8 / 

using u n oo and n ( d + 8 )/i 7 ( d + 4 ) _^ ^ by ^ iQ )_ Thig ghowg that RTn ( b * jb *^ h ) < i?T n (6 *, &o* , ft) 
for n large enough. This ends the proof of the Theorem, since 6g is the best candidate for the 
minimization of RT n (-, ■, h). □ 

Proof of Theorem 14.31 

The proof is the same as the one of Theorem |3.3| in Chapter 3. □ 

Proof of Theorem 14.41 

The proof of the Theorem is based on the following Lemma. 

Lemma 4.6. Define 

"r / % 1 Z"^,/ ( Xi — x\ r , fYi — e — mix) \ , 
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Then, under (Hi), (He) — (Ha), we have, for b\ and h and going to and for some constant C > ; 



W in (e) = f(e)+ b j [t(xeX) dMx > e d2 + x m{x)) d X I [ : 1 :."</.: 



b 2 r 
+^jt( x ex) 



d 2 ip(x, e + m(x)) 



d 2 y 



dx I v 2 K 2 (v)dv + o(b 2 1 + h 2 ) 



Var / in (e) = 



E 



/ in (e)-E/ in ( e ) 



< 



m 
h 

cm 
h 2 



K 2 (v)dv + o 



h ' 



\Ki(z 1 )K 2 (v 1 )\ zxdvx+o 



1 

h 2 I ' 



This Lemma is proved in Appendix B. 



Let now turn to the proof of the Theorem |4.4| Observe that 

f 2n (e) - Ef 2n (e) = (j 2n (e) - E/ 2 „(e)) + (f 2n (e) - f 2n (e) 



(4.6.8) 



Let now fi n (e) be as in Lemma 4.6 and note that / 2n (e) = (V n ) S"=i fin( e )- The second and the 
third claims in Lemma 4.6 yield, since h goes to under (i?io), 

~ 3 



f in (e)-Ef in (e] 



Cnf(e 
h 2 



< 



|A" l C.)/v.>" l )| 3 2idi'i+o(i 



E"=i Var/ in (e) 



0(h) = o(l). 



Hence the Lyapounov Central Limit Theorem (Billingsley 1968, Theorem 7.3) gives, since nh 
diverges under (i?io), 

7a»(e) - E/ 2n (e) _ / 2n (e) - E/ 2n (e) «, 



Var/ 2n (e) 



Var/; 



•TV (0,1) 



which yields, using the second result in Lemma |4. 6 



nh f 2n (e) - Ef 2n (e) A Af 0, /(e) / K 2 (v)dv 



(4.6.9) 



Observe now that Lemma |4. 1 1 gives, for b\ — bo, 

1 



/2n(e) - /2n(e) 



1 1 f A • 1 



n/i 3 \ nbn I n nh 5 \ nbi 



1/2 



Of 



1 



n&f! 



1,4 



/7 7 



nb a n 



1/2 



Moreover, since by Assumption (Hn) we have «&q +4 = O(l), this ensures that nb 2 ? 1 — > oo under 
(i? 9 ), using b^/(nbl d Y = 0(bl P ), p = 2. Therefore 
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1 



1 







1 



n 3 b 2d h 7 



Hence, for b and h going to, it follows that 
sfnh f/ 2n (e) - f 2n (ej) x O 



1/2 



= P (1), 



since nb^h = o(l) and nb^h 3 —> oo by Assumption (H n ). Hence from (4.6.9) and (4.6.8), we deduce 

v^(/2„(e)-E/ 2 „(e)) AM(o,f(e) J K 2 (v)dv 
This proves the Theorem, since the first result of Lemma [4~6| gives for b\ = b , 

Ef 2n (e) = m^U{ X e X ) d -^ e+m{x)) - 



+- I i(xe x) 



d 2 x 

d 2 ip(x, e + m(x)) 
d 2 y 



dx J zK\{z)z dz 
dx I v 2 K 2 (v)dv + o(b 2 + h 2 ) :=J 2n (e).a 
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Appendix B : Proof of Lemmas |4.3H4.6 
Intermediate results for Lemmas I4.3ti4.5l 
Lemma 4.7. If (Hi) — (H2), (H7) and (tig) are satisfied, we have 

sup \g n (x) - g(x)\ = V [b\ + ^ ) 
xex V rib% ) 



1/2 



sup 



1 



1 



dn(x) g(x) 



= Op f b% 



Inn 

l,d 




1/2 



Lemma 4.8. LetEi n [-] be the conditional mean given (X\, . . . ,X n ,eu,k 7^ i). Then if (Hi) — (H§), 
(H%) and (Hiq) hold, we have, for any integer i £ [1, n], p € [0, 6] and i/gR, 

2" 



E, 



< Ch 2 



< Ch J 



< Ch 3 , 



E, 



E, 



r p K m ( Y i-v 
e ^ 2 { h 



< Ch, 

< Ch, 

< Ch, 



(B.l) 
(B.2) 
(B.3) 



for some constant C > 0. 



Let E„[-] and Var n [-] be respectively the conditional mean and the conditional variance 
given (Xi, . . . ,X n ), and denote 60 V bi — max (bo, 61 )■ In the following, S n and T n are defined as 
in Lemma |4.3| Then the following results are used in the proof of Lemmas |4.3| |4.4| and |4.5| 



Lemma 4.9. // (Hi) — (Hiq) hold, then 

E„ [S n ] = O r (b 2 Q ) , E„ [T n ] =Op[bt 



nby 



Lemma 4.10. Under (Hi) — (Hio), we have 
Vat n [S n ] = O v (bivbf) 

Var n [T n ] = O P (b d \Jbf) 



nb(h 3 1 



1 

nbfh 5 





1 




nb d ) + 


nbq 


5 






1 




nbi n 2 b 2d h? 



Lemma 4.11. Define for all integer number p in [1, 3], 



U n (x) = U n (x;p) 



J2 ^(x) - m{x)f Ki ( X \- X ') K, 

1 i— 1 



Xj - x\ ( p ) (Yj-e- m(x) 



h 



and assume that(H4) and(Hj) hold. Consider C large enough and any xi, xi in X with \\x2 — xi\\ > 
Cbo V bi. Then U n (xi) and U n (X2) are independent given Xi, . . . ,X n . 
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Lemma 4.12. Set 



Pin{x) " nbtg n (x) 
T/ien under (Hi) — (H 5 ) and (H 7 ) — (A 9 ), we have, for all integers p\ and P2 in [0, 6], 



n „ 

E / 1 ( xe x ) 

i=i j 

n „ 

Ey l(a;e^)E n 



= O p (nbf) (b 2 Pl 
nbf 



dx = Of 



(B.4) 
(B.5) 



The proof of these lemmas are given in Appendix C. 
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Proof of Lemma 14.31 



The proof follows directly from Lemmas |4.9| and |4.10| Indeed, since the Tchebychev in- 
equality, which ensures that 



A n -0 P (E n L4 n ]+V</ 2 (A n )) , 



Lemmas |4.9| and |4.10| then give 



Tn 



r 
Op 



i r. t i 



1/2 



2 ,4 



1/2 



which proves Lemma |4. 3 



□ 



Proof of Lemma 14.41 

Set 



R n = J 1 (x G A") R n (x)dx. 



The proof of the Lemma proceeds by computing the conditional mean and the conditional variance 
of R n . For the conditional mean, define 

I in (x) = J\l - ufKt m{x) - U ^ m{x) - m( ' T)) ) du, 

Rin{x) = n ^d h 4 Kl ( X \ 1 T ) (™in(x) - m(x)) 3 I in (x). 



This gives 



n p 

E„ [R n ] = J 1 e X) E„ [i? m (x)] da:, 



(B.6) 



where 



E„ [Rin(x)] 



1 fXi-x\ 
if i I -A 1 E r , 



(fhin(x) - m(x)) E in [I in (x)] 



Moreover, since by Lemma 4.8 (B. 3) we have 
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it then follows that, setting pi — 3 and P2 = 1 Lemma 4.12 



|E„ [Rr, 



< 



< 



1 l—l 

~~1 P 

l h Y,Jt( x eX)E n 



(m in (x) - m(x)) 3 K\ ' 



dx 



C 
nbih 

C 



X; - x 



bi 



dx 



<> 



nbfh 
1 



Xj-x 
bi 



dx 



V V 0+ nb d 



1/2 



(B.7) 



Consider now the conditional variance of R n . Let C large enough and consider x\, x% in 
X with \xi — xi|| > Cbo V b\. Then given X\, . . . ,X n and under (H4), there exists two functions 
$i„ and $2n such that 

Rn (xi) = $i„ (e i} i e h) and R n (x 2 ) = $2n (e<, 1^2), 

with an empty Zi fl I21 since the Kernel functions -Ko and ifi are compactly supported. Hence 
R n (x\) and i?„ (#2) are independent given Xi, . . . , X n , provided that \\x2 — x\ || > Cbo V 61, for C 
sufficiently large. Therefore 



Var„ (Rn) 
— Var„ 



1 (x G X) R n (x) dx 



1 ((xi, x 2 ) € A" 2 ) Cov„ (i?„ (xi) , Rn (x 2 )) dxidx 2 



< J J t((x 1 ,x 2 ) eX 2 ,\\x2-x 1 \\<Cb yb 1 )V(iTl/ 2 (Rn( X i))YsiTl/ 2 {R n (x2))dx 1 dx2 
~ 2 / / 1 (^'^ G * 2 ' l^ 2 ~ - Tl ll - C&0 V 6l ) ^ Var " (^1)) + Var » (x 2 ))}dxidx2 



< C (b V h) d l(xeX) Var„ (i?„ (x)) dx, 



(B.8) 



where 



l(x€ X) Var„ (i?„ (x)) dx 

n „ 

= y l(x e X) Vain (Rin («)) dx 

t(xeX) Cov„ (i? iin (x), i? 42 „(x)) dx. 



1 = 1 ' 



(B.9) 



l<2i^2<n * 



For the conditional variances in (|B.9|), we have 
Var„ (i? irl (x)) < 



(nbfh 



1 n / X," — X N , ^ 

Kl —„ E r . 



61 



(m m (x) - m(x) f lf n (x 
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with, applying Lemma 4.8 (B. 3) 
E 



(m in (x) - m(x)f If n (x) 



Er, 



< CE n 

< ChE, 



(m m (x) - m(x) f E in [lf n (x)] 

2" ~ 



(fhi n (x) — m(x)) 6 supE,; 



(fh in (x) - m(x)) 



0) (Yi-y 



du 



Hence from this result and Lemma |4.12[ we deduce 

n „ 

53 J t (x G X) Var„ (R m (x)) dx 



i=l ' 
< 

< 



Ch 



(nbfh 4 
Ch 



53 / 1 (x G X) E„ [(m m (x) - m(x)) 



OpM a 4 1 



Let now turn to the sum of the conditional covariances in (B.9|. We have 



|Cov„ {R lin (x),R l2n (x))\ < Var^ 2 (R nn (x)) Var^ 2 (i? 22 „(a;)) 



where 



Hence 



Ch 

Var„ (i? n „(x)) < ( ^ fe4)2 E» 



(m iin (x) - m(x)) -ft' 2 



6i 



Op 



(nbfh 4 ) 2 



53 / 1 (x G A*) |Cov„ (R lin (x) , R i2 n{x))\ dx 



l<Zl <n 



53 j t(xeX) El/ 2 [(m lin (x) - m(x)) ( 

l<2l^Z2<71 



e!/ 2 



(m i2n (x) - m(x)) 



Xj\ ~ x 
~b7~ 



dx 



< 51 j t{xeX)E n \ [ {m Hn (x)~m(x)f 
+ 5Z Jt(xeX) E n [(m i2 „(a;) - m(x) f 

l<2l^22<?l 



Xjj — a; 



/V! 



6i 



6i 



(B.10) 



d:X 



dx. 

(B.ll) 
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Moreover, under (Hj), the change of variable x = u + b\Xi 2 and Lemma 4.12 give 



jt(xeX)E n (m iin (x) - m(x)Y 



X H - x 



X i2 - x 

h 



dx 



hi j t{u + b 1 X l2 eX)E n \{m lin {u + b 1 X l2 )-m(u + b 1 X l2 )f 



K x (u) K x 



X l2 - u + b 1 X l . 



= Op (nbf) J2 h(xeX) E„ [(m ln (x) - m(x)f 

n p 

= P (nbf) £ / 1 (a e *) {Pf n (x) + E [Zf n (x)]) 

i—1 



du 
Ki 



Xj-x 
h 

Xj - x 

h 



dx 



dx 



Therefore collecting this result and (B.ll), we arrive at 



/ 1 (x £ X) Cov„ (R iin (x), R l2 n{x)) dx 



l<ii^Z2<n 



(nbfh 4 ) 2 \ nbfi 



Substituting this order and (B.10) in (B.9|, it follows, since nbf —> oo under (-ffio), 

nbfh n 2 b\ d h ~ 



J t(xeX) Var„ (R n (x)) dx = O w 



(nbfh 4 ) 2 (nbfh 4 ) 2 

3 



Hence by (B.8), (B.7) and the Tchebychev inequality, we have 

. 3 hd \i hd 



nb$ 



1/2 



which proves the validity of the Lemma. □ 

Proof of Lemma 14.51 

Set 

P n = J i (x e x)p n {x)dx. 

The proof of the Lemma follows by computing the conditional mean and the conditional variance 
of P n . For the conditional mean, define 

1,1 _£•(!) { Yi ~ e + fhin(%) -t(m n (x) - m in (x))' 



^in (^) 



dt, 



Xi - x 



Since 



Pin(x) = nb d h 2 (™n( X ) _ ™in{x)) K X ^ - j I in {x). 



fh n {x) - m ln (x) = ' K \ \ 
n%g n [x) \ b 
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and that Kq is bounded under (i/7), Lemma 4.7 gives 



n « 

E„ [P„] = ^ / 1 (x £ X) E„ [P m (x)] dx 

i=l ^ 

1 - r 

1 1=1 



Moreover, observe that for any y £ 

41) f Y i-y 



Yili n (x) 



dx 



(B.12) 



E; 



h 



= m(Xi)Ei 



Kn 



(1) (Yi-y 



"Ein 



(1) 



K - 



Therefore, since m(-) is continuous on the compact support X of the X^s, Lemma 4.8 (B.l ) yields 



sup 



E; 



YiK 2 



(1) (Yi-y 



< Ch 2 , 



uniformly for i £ [1, n\. Hence conditioning with respect to (X±, . . . , X n , £&) yields that 

.(i) (Yi - y s 



E„ 



Yj,Ii n (x) 



< 



SUp / E in 



YiK' 2 



dy 



< Ch 2 , 



for all i and x. Combining this result with (B.12 1, we arrive at 



E„[P„] = O f 



Xj - x 
h 



dx 



1 



(B.13) 



Let now consider the conditional variance of P n . Since 
1 



Pin 



and that ifo(") an d ^i(') have compact supports under (-H7) and (-ffg), it is shown that P n (xi) 
and P„ (X2) are independent given Xi, ... ,X n , provided that \\x2 — Xi\\ > Cb V b 1 , for C large 



enough. Hence arguing as for (B.8) gives 

Var„ (P„) < C (6g V 6?) fl{x£X) Var„ (P„ (x)) dx, 



(B.14) 



where 



A") Var„ (P„ (x))dx 

n « 

= / 1 e x ) Var « ( p » o*)) dx 

2=1 

^ Jl(x£X) Cov„ (P nn (x), P 2 „(x)) dx. 



(B.15) 



For the conditional variances in (B.15), first note that 
1 



Var n (P m (x)) < 



(nbfh 2 ) 2 



Xj - x 
hi 



1 



(nb%) 2 gl(x 



Xj - x 
bo 



(y 4 -m(x)) 2 /? n (x) 



(B.16) 
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Next, observe that for Xi = z and j/gK, and under (Hi), (H3) — (H5) and (-ZT7), we have 



E r , 



ft 



(m(.) + e) 2 i^ ( m ^ + e-y 



f(e)de 



< Ch, 

uniformly in x and i. From this result and the Holder inequality, we deduce 
E, 



< 



(i) (Yi - e + m in (x) -t(rh n (x) - m in (x)) 



dt 



< sup E„ 
Ch. 



Y/Kk 



(i) (Yi-y 



Hence by (B.16) and Lemma 4.7 we have, since K (-) is bounded under (H7), 

'X, - x s 



so that 



Var„ (P in (x)) < {nb d h2)2 x (nh 2 d) 2^ n[x 

n „ 

5^ / 1 (x € X) Var„ (P in (x)) dx 



Of 



= Of 



h 



(nbfh 2 ) 2 J \(nb d ) 



w)tJ 1(XEX)Kl 



2 I -X-i X 



dx 



1 



nbfh 3 J \n 2 bf 



1 



(B.17) 



Let now consider the sum of the conditional covariances in (B.15I. We have, using the inequality 
above, 



\Cov n (P Hn (x),P l2n (x))\ < V&rl/ 2 (P Hn (x))VMl/ 2 {P t2n (x)) 



< 



C 



x 



(nbfh 2 ) 2 (nbl d ) 2 gfr(x) 



Ki 



Xi, — x\ ( Xi„ - x 
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Hence from Lemma 4.7 we deduce 



J t(xe X)\Cov n (P lin (x),P l2n (x))\dx 

E / 1 (* e *) 



= o f 
= Of 



1 \ / ft 



(nbfh 2 ) 2 J \(nb%) 2 
1 

n 2 b 2d h 3 



X il - x 



bx 



K1 



X i2 - x 



dx 



Substituting this order and (B.17) in (B.15), and using (B.14), (B.13) and the Tchebychev inequa- 
lity, we arrive at 



Pn = Of 



= Of 



1 



1/2/1 / 1 



nbfh 3 \n 2 b 2d J n 2 b 2d h 3 



1 



1/2 



1 b d v bf ^ 1/2 

n 2 b 2d + n 2 b 2d h 3 . 
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This ends the proof of the Lemma. 
Proof of Lemma 14.61 



□ 



The first equality of the lemma is given by (4.6.7), since /2n(e) = (l/ n ) /in( e )s so 



that 



E/ in (e) = E/ a „(e) 

= f( e ) + 7T / 1 G *) n , y —dx zK 1 {z)z T dz 

2 / a-^x 



/i 2 /" 
+ Y Jt( x eX) 



d (p(x, e + m(x)) 
d 2 y 



dx / u 2 i4r 2 («)^ + o(6? +/i 2 ) . 



For the second result of the Lemma, we have 



Var | f in (e) j l fl(< 
1 



E 



b\ d h? 



E 



1 (x e X) K x 



fin( e ) 
Xj - x 



K 2 



Yi — e — m(x) 
h 





2" 


j dx 





0(1). (B.18) 



Observe now that the changes of variables x = x\ + b\Z\ and yi = e + m(x± + &1Z1) + give 



E 



i(xeX) K 1 



Xi - x 
b~i 



Yi — e — m(x) 

h 



x\ — x\ ( y% — e — m{x) 





2" 


j dx 





bi 



dx 



b\ d h I dxi 



K 2 (v 1 ) / 1 (xi + hzi G X) Jfi(«i)dsn 



ip (xi,e + m{x\ + b\Zi) + bivi) dv\. 

(B.19) 



Moreover, note that under (H3) and (Hq) we have 



\xi+b\Zi) = m(xi) + biZi / rrS 1 ^ (x\ + tb±Zi) dt 



f ^ dip 

(p(x 1 ,e + m(x 1 +b 1 z 1 ) + biv 1 ) = ip (xi, e + m{xi)) + biZi6 n (xi, z\) / (xi, 9 n (u, x\, z{j) du, 

Jo uy 



where 



Therefore 



l (x\,zi)= / rrS^ (x\ + tb\Zx) dt, 9 n (u, xi, zi) = e + m{x\) + uO n (xx, z{). 
Jo 



J dxi J K 2 {vi) J 1 (xi +b\Zi G X) K\(zi)dzi • 



ip (xi, e + m(xi + b\Zi) + b\Vi) dv\ 



dx\ 
dx\ 



K 2 ( Vl ) / 1 ( Xl + Mi G X) K 1 {z 1 )dz 1 



<f(xi,e + m(xi)) dvi + 0{b\) 



G X)K 2 (vi) / K 1 (z 1 )dz 1 



if (xi, e + m(xi)) dvi 



+ / dxi / 5 n (xi,vi)<p(xi,e + m(xi))dvi+0(bi), 



(B.20) 
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where 



K 2 (vi) J l(xi +b lZl S X)K 1 (z 1 )dz 1 - l( Xl € X)K 2 {v 1 ) J K 1 {z 1 )dz 1 



Applying the Lebesgue Dominated Convergence Theorem yields, for 61 going to 0, 



dx\ I S n (x\, vi) <p (xi, e + m{x\)) dv\ = o(l). 



Hence by (B.20) and (4.2.4), we have, since \K\(z\)dz\ = 1 under (-H7), 



J dxi J K 2 {vi) J 1 (xt + hzi <= X) Ktizifdzi 



<p(xi,e + m{x\ + bxzx) + hvi) dv\ 



K 2 (vi)dvi / 1 (x\ € X) <p (xi, e + m(xi)) dx\ + o(l) 



= / K*(v)dv + o(l). 



Combining this result with (B.19) and ( B.18[ ), we arrive at 



\h,(/,„U)) =M f K 2 2 (v)dv + o(±- 



which proves the second result of the lemma. 

The last statement of Lemma is immediate. Indeed, the Triangular and Convex inequalities 



and the Lebesgue Dominated Convergence Theorem give, by (4.2.41 

~ 3 



E 



/ m (e)-E/ m (e) 
C 



< 



dx\ 



, . . x% — x \ ^. I yi — e — m(x) . , 
l(xeX)Ki\ -\ ) K 2 I ; ) dx 



bi 



b\ d h? 
Cb 3d h f f f 

j j \K 1 (z 1 )K 2 {vi)\ 3 dzxdvx + o f-^ I .l 
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dyi 



tp (xi, e + m(xi + b\Zi) + bivi) dv\ 
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Appendix C 



Proof of Lemma 14.71 



See the proof of Lemma 3.1 in Chapter 3 



□ 



Proof of Lemma 14.81 



For the first bound in (B.l), set f p (e) = e p /(e), and observe if Xi = x, we have by (H4,) 



and the change of variable e = y — m(x) + hv, 



Ei 



(i) fYi-v 



= E 



vr,(i) ( £ t + m(x) - y 
£ * 2 { h 



KP ( e + m{ ^ y ) f p ( e )de = h I K^\v)f p (y - m(x) + hv)dv. (C.l) 



Therefore, since f p has a bounded continuous derivative under (Ag) and that J K% (v)dv — under 
(-ffs)j the Taylor inequality gives 
,(i) ( e + m(x) -y' 



f p (e)de 



= h 



f P (y - m(x) + hv) - f p (y- m(x)) 



dv 



< ft 3 sup|/W(«)| / \vK^(v)\dv 

< Ch 2 , 



uniformly in x G X and y G K. Hence from this inequality and (C.l ), we deduce 

<Ch 2 , 



E; 



for any y G K. This proves the first inequality in (B.l). The second bound of (B.l) is immediate 



under (H5) and (Hg), since for any x in X , £ G [1, 3] and y G R, 

2 



E, 



£ * K2 { h 



Xi = x 



h I K ( 2 l) {v) 2 f p ((y - m(x) + hv) dv 



< hsup\f p (u)\ I K { 2 l \v) 2 dv 

< Ch, 



(C.2) 



uniformly for i, x and y. This proves (B.l) 



The proof of the second inequalities of (B.2) and (B.3) follows from ( |C.2 ). The first bounds 



in (B.2) and (B.3) are proved simultaneously. For any integer £ in G {2,3} and x G X, we have 

,(2) f e + m(x) - y" 



E, 



Xi = x 



A' 



f p (e)de 



h I K^'(v)fp (y — m(x) + hv) dv. 



(C.3) 



Under (Hg), the Kernel function -K2O) is symmetric, has a compact support and two continuous de- 
rivatives, with J K 2 \v)dv = and JvK^ {v)dv = 0. Therefore, since f p has a bounded continuous 
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second order derivative by {H§), the second order Taylor expansion gives, for some 9 = 9(y,x,hv), 

hfKl%)f p (v-m( X ) + hv) d v 

dv 

hvfM (y - m(x)) + '^-j {2) (y - m(x) + Ohv) dv 
v 2 K^\v)f^ 2) (y - m(x) + 9hv) dv 



h I K?>{v) 



h J K^'(v) 
h 3 



f P {y - m(x) + hv) - f p (y- m{x)) 
h 2 v 2 



2 

< Ch 3 



Hence from this bound and (C.3), we deduce 

E, 



em 



uniformly for i and y. This ends proof of the Lemma. 



< Ch 3 



□ 



Proof of Lemma 14.91 



We have 



E„ [S n ] = Jt(xeX)E n [S n (x)] dx, E„ [T n ] = Jt(xeX)E n [T n (x)} dx, 



with 

E n [S n (x)} = 



1 i—l 



Xj - x 
h 



E n 



A 



(i) (Yj-e- m(x) 



1 

®n{T n (x)} = - b ^J2(ffn(x)+En[Z 2 n (x)])K 1 
1 i—l 



Xj-x 

h 



E, 



K, 



(2) 



Y - 



e — m(x) 



Observe first that under (H4), Lemma 4.8 (B.l) and Lemma 4.7 give 



sup 



1 i—l x 



E„ 



A' 



(2) ( Yj-e- m(x) 
h 



Ch 3 

< -TT SUp 

h 6 xe x 



1 ^ KUKi(^) K f Xi -x 

<k k?„(x)) 2 1 



= Op 



and then 



1 i—l ^ ' 



(2) ( Yj-e- m,{x) 
h 



dx 



Consider now 



1 i—l 



Xj-x 



dx. 



which is such that, using Lemma |4.8[ the equality and the bound above, 



|E„ [S n ] I < CV n (1) , |E„ [T„] I < CV n (2) + Op 



1 
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(B.4) ensures that V n (p) = Op (^o P ) f° r au integer number p 6 [1,6], it then 



Since Lemma 4.12 
follows that 

E„ [S n ] = Op (ftg) , E„ [T n ] = O P [b i 
This proves the validity of the lemma. □ 

Proof of Lemma 14.101 

Define e in {x) — m in (x) — m(x), which is such that 



nb d 



1 ™ 

U n (x) = U n ( X - P ) = d Yl e ™ ^ Xl 



Let 



1 2—1 

I7 n (p)= / l(xeX)U n {x)dx, 



Xj - x ^ ^( p) f Yj-e- m(x) 



so that >S n = U n (l) and T„ = U n (2). Observe now that Lemma 4.11 gives 



Var„ (U n (p)) 

= Var„ ( I l(xeX) U n (x) dx 



1 ((xi, x 2 ) G A" 2 ) Cov„ {U n (xi) , [7„ (x 2 )) dx\dx 2 



1 ((xi,x 2 ) G A" 2 , \\x 2 - Xi|| < C6 V &i) Cov„ ({/„ (xi) , f/„ (x 2 )) dxxdx 2 

< J J t ((xx,x 2 ) e A 2 , ||x 2 - siH < C&o V bi) Vary 2 ([/„ (xi)) Var, 1 / 2 (U n (x 2 )) dx Y dx 2 

< ~ f ( 1 ((^1,^2) € A" 2 , ||a;2 — xi|| < C6 V b{) {Var„ ([/„ (xi)) + Var n (U n {x 2 ))} dx x dx 2 



< C (6g V 6f) / l (x e A) Var„ (t/„ (x)) dx, 



(C.4) 



Moreover, we have 



(n6f/i p+1 )' / 1 (x e A) Var n (C7„ (x)) dx 



J2 J 1 (x e X) K\ (^~) Var„ (W in (x;p)) dx 



l<Zl ^2 <n 



l(xeA)K 1 ( X \ X ) K x f Xt2 L * )Cov„ (W iin (x;p),W i2n (x;p))dx, 



61 



(C.5) 



where 



W in (x;p) = ef„ (x) if. 



(p) ( Y i-e- m ( x ) 
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The first term in (C.5I yields, by Lemma 4.8 and Lemma 4.12 



J2jt(xeX) K\ Var„ (W in (x;p)) dx 



< E/ ! • • V: ' aJ ( A ' / , : ' I 



(p) / lj - e - m(x) 



AT' f A '. -| ,l.r 



< Chj^jtixtX) (ffi (x) + E„ [S^(x) 

Z — 1 



A' 



( p) ( Yj-e- m{x) 
h 



dx 



-l Xi X ' ,/,■ 



&1 



(C.6) 



For the sum of the conditional covariances in ( C.5 1, set 
W n (p)= E [t(xeX)K 



A, M ^ [__t — E ] Cov„ (W il7l (af;p), W i2n (x;p)) dx. 



We need to bound this term for p £ [1,2]. Since 



= ((3 m (x) + X m (x)f (or) X* 
the independence of the the Yj's gives, for any i x ^ 12, 



(p ) (Yj-e- m{x) 



Cov„ (W iin (x; l),W t2n (x; 1)) 



Am(^)Cov„ 
+/3i 2 „(x)Cov 
+Cov 



(!) fY h -e- m{x) 



1 ^i2n { x )^2 



(1) fjjg-g - m ( X ) 



(C.7) 



Moreover, it is clear that the results of Lemma 4.8 remain valid with E„ [•] , since E„ [A] = E„ [E in [A]] , 
where E in [-] represents the conditional mean given (Xi, . . . ,X n ,e/ i ,k 7^ i). Therefore, since AT (-) 



is bounded under (H?), this yields, by (H4) and Lemma 4.7 
-(I) f Y h -e-m(x) 



Pi in (x)Cov. 



(i) ( Y i2 -e-m(x) 



■K 



Xi X 



nb$g n (x) "V b o 
Or(^) W nn (x)\ 



E„ 



e h K 2 



- 1 h 

(i) (Y% x - e - m(x) 



E„ 



A' 



(1) - 6 - "^fa) 



(C.8) 
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uniformly in x, i\ and ii- We also have 



Gov*, 



E[ £ 2 ] 
(nb^g n (x)) 2 



(i) (Y^-e- m(x) 



(i) (Y l2 - e - m(x) 



E ^''' v " 



» 3 =l 
■3^*1 >*2 



En 



(i) /^ii - e - rn(x) 



E, 



(i) -e-m(a;)^ 



(nfeg<? n (x)) : 



X i2 - a; 



E, 



(i) (Y^-e- m(x) 



bo 



E, 



ti 2 rs. 2 



(i) / Y^ 2 - e - rn(x) 



This gives, by (H^), (^7), Lemma 4.7 and Lemma 4.8 
Cov 



Ow 



^i 1 n( x )X 2 
h" 



(i) Yj x ~ e ~ m(x) 



E ' ^ 3 ' 



i 3 =l 



Si 2 „(a;)i^ 2 



Op 



(1) /3^-e-m(x) 



Of 



nb', 



uniformly for any i\ ^ 12, x\ and X2. Collecting this result, ( |C.8[ ) and ( |C.7[ ), it follows, using Lemma 



4.7 and taking pi = 1 in Lemma 4.12 (B. 4 1, 
W n (l) 



Op 



E 

+ E 

1 <i 1 7^22 <n 



1 (x e X) 



/3 iin (x)K 1 



Xi x - x 



x 



dx 



i(xex) 



X h - x 



bi 



X l2 ~ x 



61 



dx 



n „ 

< O r (nbf) E / 1 (« G *) 



i=l ■ 

= Or (n%\ d ) (bl) + P (n 2 6 2d ) = O p (n 2 6 2d ) . 
Combining this result with ( |C.6[ ), ( |C.5 | and (C.4), we arrive at 
Var n (5„) = Var„ (Z7„(l)) 



X t - x 
61 



dx + Op (n 2 b\ d ^ 



< Or (&o V bi) x 
= Op {bt V 6?) x 
= Op(^V^) 



1 

(nbfh 2 )' 
1 



(nbfh 2 ) 



nbfh (% + JL) +Wn (i) 



nbfh ( b\ 



1 \ nb\ d tf 



1 



1 



nbfh 3 \° nb$, 
This proves the first result of the Lemma. 

For the second, we also have by ( |C.4[ ), (C.5) and (C.6), 

nbfhlbt 



Var„ (T n ) = Var„ (U n (2)) = 

(nbfh 3 ) 

Hence the order of Var n (T n ) follows from the following result 



W„(2) 



W n (2) = Or 



h 6 h 3 



(C.9) 
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Indeed, (C.9) and the equality before give 



Var„ (T n ) 

Op (b d Q V bf) 
(nbfh 3 ) 2 

= P (b$Vbi) 



1 V 1 



2 ,4 



i&f/i 5 V ' nbi) ' nbi ' n 2 b% d h 3 



This yields the second result of the Lemma. We now prove (C.9). Observe that for i\ ^ 12, we have 
Cov n (W hn (x;2),W i2n (x;2)) 



= Cov„ 
= /3? (x)Cov. 



+#L (x)Cov„ 



(2) fY^-e- m(x) 



(2) ^ y» 2 - e - m(a;) 
h 



+2^ in (x)ft 2 „(x)Cov r , 
+2/3 n „(x)/3? ( x )Cov ri 



(2) /^n - e- m(x) 



( 2 ) /y^-e - m(z) 



+2/3 H „(x)Cov„ 
+2/3 l2 „(x)Cov„ 
+4/3 lin (a;)/3 l2 „(a;)Cov„ 
+Cov„ 



(i) f Y^-^-m^) * 2 



(2) /li! - e - m(x) 



i ^'iin( a; )-^2 





/i 




y 2 - 


e — 


m(x) ' 




/i 




y 2 - 


e — 


m(x) 




/i 




n 2 - 


e — 


m(x) 



(CIO) 



The two-first terms in (C.lOl are treated similarly, since they are symmetric. Under (-H4), we have, 
for any i\ =/= 12, 



Pl n (x)Cov r , 



(2) (Y ir -e-m{x)\ 2 ,„(2) /*i 2 -e-m(a;) 



^" (a;) 2 £ *o ( A '- r i <^,. 

(n6^ n (x)) i< l3#i2 <„ 



&o 



(2) (Y ix -e- m(x) 



2 ^(2) y» 2 - e - m(s) 

' fc i3^2 



/CO") 



7^2 



- x 



Cov„ 



(n^&Cr))* " V b 
with, using Lemma |4.8[ 

,(2) (Y ix -e- m(x) 



(2) (Y^-e- m(x) 



2 ^(2) - e - 

) £ ii JX 2 



Cov„ 



< 



E„ 



.2 ^(2) f*i a -e-m(x) 
' £11^2 



<K 2 



(2) ^y 4l - e - m(i) ^ K ( 2 ) {Y i2 -e- m(x) 



+ E n 
Ch e . 



<K 2 



(2) {Y i2 -e- m(x) 



E„ 



(2) fY^-e- m(x) 
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Therefore, since Kq(-) is bounded under (i?V), Lemma 4.7 gives 



(2) (Y h -e-m{x)\ 2 (2) (Y i2 -e-m{x) 



— 2 

(n^g„(x)) 



ft 

&7 



= Of 



ft 



(C.ll) 



uniformly with respect to i%, i 2 and a;. 



For the third and the fourth in (C.10), we also have, uniformly for ii, 12 and x, 



L(a;)Cov„ 



0)#2 



(2) -e-m(x)\ (2 ) / Y h - e - m(x) 



,K 2 



&in{x)ff 2n (x) ( X n -X 



nb$g n (x) 
ft 6 



K 



bo 



E, 



f- K K 



(2) fY n - e - m{x) 



A' 



(2) (Yj^-e- m(x) 



= O 
Further, note that 



Piin(x)l3i n (x) \ . 



(C.12) 



/3 iin (x)Cov Tl 



(x)K, 



(2) / *ii 



K, - e - mfx) 



ft 



(2) (Y i2 -e- m{x) 



ft 



/3 iin (x)E n 



Z iin (x)% n (x)K' 2 



(2) fY^-e- m(x) \ K ( 2 ) (Y i2 -e- m(x) 



(2) /Fi! - e - m(x) 



E, 



(x)K 2 



ft 



(2) /Fi 3 - e - m(x) 



where 



1 



(2) fY i2 -e- m(x) 



K 



Xj i2 X 



E„ 



f- K ( 

t i2- ti 2 



(2) (Yj^-e- m(x) 



< 



Ch* 



nb^ \g n {x)\ 



Therefore by (H?) and Lemma 4.7 we have, uniformly for ii, 12 and x, 



/?i in (a;)Cov n 
ft 3 



{x)K 2 



(2) { Y h - e - m(x)\ 2 ^x^(2) /*i 2 -e-m(x) 



ft 



Op 



ft 3 



< F 



?? ft; 
ft 3 

n 2 bl d 



(nb^g n (x)) 2 



E*3 



IA in (x)| , 



(C.13) 



We now treat the two last terms in ( |C. 10 1. Observe that 



Cov l; 



(2) (Y h -e- m{x) 



(x)K 2 

E[e 2 ] 
{nbig n {x)Y 

1 

{nb^g n (x)) 2 ^" V b 



> ^^2n(x)A^2 



(2) ( Y i2 -e- m(x) 
ft 



E ^ rV,; 



i 3 =l 
L 3^ l l ■' i 2 



E, 



(2) (Y^-e- m(x) 



E, 



A. 



(2) / F 2 - e ~ m(x) 



Zjj - .x 



E„ 



(2) fY^-e- m{x) 



xKn 



bo 



E„ 



e l2 K 2 



(2) /F 2 -e-m(x) 
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This gives, by Lemma |4.7[ Lemma 4.8 and uniformly with respect to i\ ^ «2 and x, 



(x)(3 l2n (x)Cov 



( 2 ) (Yj^-e- m(x) 



V 
r 



Xi„ — x 



ft 
ft 6 



( 2 ) {Y i2 -e- m(x) 



(rib*) 2 



\Pi in (x)Pi 2 n(x)\ 



\Pi in {x)(ii 2 n{x)\ 



Moreover, 



Cov n 



-2 t„w&) ( Y ii -e-m(x) 



< 



(2) (Y h -e- m(x) \ K (2) (Y l2 -e- m(x) 



h 



E„ 



(2) /lij - e - m(x) 



ft 



E„ 



(2) / ^ 2 - e - m(x) 



ft 



with, using (i/4), Lemma 4.8 and Lemma 4.7 
E 



(2) fYj-e- m(x) 



ft 



< 



E„ [54(a)] E„ IX 2 

^! _ j2 ki 

(nb$g n {x)) j=1 



(2) (Yj -e-m(x) 



Xj-x 



= O v 



ft 3 



uniformly for i and x. Moreover, for the first term in Bound (|C.10|), we have 



(2) f Y h -e- m(x) \ K (2) ( Y l2 - e - m{x) 



ft 



E, 



(2) fii 2 -£-m(a;) 



E; 



(2) / K tl - e - ro(s) 



ft 



where 



E, 



(2) (Yj^-e- m{x) 



< 



— 1 — 2 E *< : 

(n6g3„(a;)) i< l3 ^ 2 <„ 
Ch 3 



bo 



fc l3 /\ 2 



(2) / Y h - e - m(x) 



(C.14) 



(C.15) 



(C.16) 



n K (dn(x)) 

Therefore, since K\ is bounded under (H7), it follows, by Lemma 4.7 and uniformly with respect 
to x, i\ and £2, 



E.,, 



ft 3 



(2 ) f Y h -e-m(x) \ K (2) f Y i2 -e- m{x) 



ft 



M^)M S U*)]=P* 



ft 3 



n 2 ^ J ' 
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Hence from (C.16) and (C.15), we deduce 
Cov. 



(2) fY tl -e-m(x)\ 2 (2 ) (Y i2 -e-m(x) 



Op 



uniformly in x, i\ and i<i. Collecting this result, (C.13|-(C.14) and (C.11)-(C.12), it follows then 



by Equality flC.10}, 

W n (2) 



7 

+o P 
+o P 
+o P 
+o P 



E 

l<ii 7^22 < n 



h 3 



n 2 b 2 d 



I (x e X) Ki 

i(xex) 

E / 1 ( x e *) 



X h - x 



X l2 - x 



E 

1<*1 <n 



ft 6 

ft 3 



n 2 ^ 



E JHx*X) 
t(xex) 

E / 1 (* € *) 



E 

l<ii^Z2<n 



Pi 1 n{x)Pi 2 n{x)K l 

Ki 



X n - x 



Cov„ (W iin (a;; 2), W l2n (x; 2)) 



X i2 - a; 



6i 



(2a; 



- a; 



#1 



X- n - a; 



dx 



X lx - x 



K x 



X l2 - x 



dx 



X il - x 



X h - x 



Moreover, note that for any integers pi and P2 in [0, 2] 



E jnxzx) 



< 



X il - x 



h 

X i2 - x 
bi 



X i2 - x 



K 



X i2 - x 



dx. 



dx 



(C.17) 



E 

+ E 

l<2i^2 <n 



i (i e Af) 



Cn +P2 (^l 



Xjj - a; 



6i 



bi 



dx 
dx 



1 (x G X) 



Cn +P2 (^l 



X n - x 



Since (H-j) and Lemma 4.12 (B. 4 1 give, for p = p\ + P2 



E / 1 (* G x) 

bf J2 ! 1 ( u + b ^ e x) 



X (1 - a; 



6i 



X,;„ - x 



-^?2 ^ 



6i 



da; 



l<ii^2<n 



n „ 

= Op (nbf) ^ l(xeX) 
»=i ^ 

= P (n 2 ^)(^), 



X - a; 



X i2 -u - biXj. 
bi 



da 



dx 



it the follows, by this result, the bound above and (C.17) 

h 6 



W n (2) = Op 



This proves (C.9) and then completes the proof of the Lemma. 



□ 
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Proof of Lemma 14.111 

The lemma follows directly from the fact that given X\,...,X n , we have U n (x\) = 
$in i € I\) an d U n {X2) = $2n (ei, * € ^2)) with an empty I\H I2, since the Kernel functions are 
compactly supported and \\x2 — x%\\ > Cbo V 61 for a sufficiently large C. □ 

Proof of Lemma 14.121 



Define 



1 r ™ 

1 2—1 



61 



and 



which is such that, using Lemma |4.7| 



A y (,-) = (m,.V / )-,M.r))/vo('^-^ 



I An (a:) I 



< sup 



< 



9n (x) 
P (I) 



E (A J (x)-E[A J (x)}) 

l<j^i<n 



E E tM*)] 

l<j^i<n 



u6q 



E (A^-EfA^s)]) 

l<j^i<n 



E E [AiW] 

l<j^?<n 



uniformly in x. This gives using the Markov Inequality which ensures that A n = Op (E |A n |), 



\Vn\ 



< 



0,(1) 1 



no, n( 



'0; i=l 



e y 1 (x g a-) 



xE 



X - x 



E e^w] 

l<J^i<T> 



< 



Op(l ) 
(nftg) 



|pr y I^g^/e 



y £ l (A j (x)--E[A j (x)]) 
3=2 



dx 
Pi 



EE[A,(x)] 

i=2 



pi • 



dx. (C.18) 
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We bound the two resulting integrals in ( C.18 1. For the first, the Marcinkiewicz-Zygmund inequality 
(see e.g Chow and Teicher, 2003, p. 386), the Holder and the Minkowski inequalities give 



J i(x e x)E 



3=2 



dx 



< Jl(xe X)E 1/2 

< J t(x G X)E 1/2 



^(A^aO-ElA^Or)]) 



3=2 



2pi 



in 



dx 



^2(A j (x)-E[A j (x)]Y 



3=2 



dx 



J2(A j (x)-E[A j (x)]f 



3=2 



Pl/2 



dx 



Pl/2 



< lt(xeX)\j2 E 1M [| (A, (x) - E [Aj (x)]f 

.3=2 

Pl/2 

< C I 1 (x e x) E 1/Pl Af 1 (x) 



dx 



dx 



3=2 



2pi 



-, 1/ P1 -\ Pi/ 2 



g (z) dz 



C Jt(xeX)lj2 J [i m ( z ) - m O)) K o 



C J t (x e X) I &o J (( m ( x + b o u ) - m(a;)) K (u)) 2p g (x + b Q u) du 



dx 



C 



h d h 2pi 
°0°0 



1/p 



3=2 
Pl/2 



1/Pl 



Pl/2 



= o(M) l/2 &r) 



dx 



(C.19) 



For the second resulting integral in ( |C.18 1, we have, since the Aj(x)'s are identically dis- 



tributed, 



J t(xe x) 



X)E[Aj(x)] 

3=2 



< n Pl j t(xe X) 
Cn Pl 



b o / ( m I 2 - + boil) — m (x)) g (x + b^u) Kq (u) du 



dx 



(b«xb 2 r\=0(nb*+ 2 ) P \ 

using fuKo(u)du = and the fact that expect for those a; at a distance O(&o) of the boundaries 
of A", we have for all u in the support of Kq (•), 

(m (x + boil) — m (x)) g (x + bou) 

= b ^m (1) (x) u T + b u J (1 - t) m {2) (x + tb u) dtu T ^j (^g (x) + b J (x + tb u) dtu 1 

Substituting the order in the bound above and ( |C.19[ ) in (C.18), we obtain 



V n = 



Or (I) 



(n P X) 1/2 K + K + T] = Or (#*) , 
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since under (H g ), we have 6^/(n6^) p = O{b 2 p ), for all p in [0,6]. This proves |b~4|. 



Let now turn to (B.5). The Holder, the Marcinkiewicz-Zygmund and the Minkowski in- 
equalities give 

' Xi - x N 



J l(sGAr)E r , 

i—1 



dx 



E 



\J \nblg n (x) 



E„ 



iff 2 



< 



C 



< 



E 

l<j^i<n 



y /" 1 (x £ *) El/2 

tj \<9n{x)\ Pl \ n 

c±i H r x i{ e 



X 3 ~ X 

bo 



2pi 



iff 2 



Xj - a; 
6i 



E ;^(' v ; 

l<j^-i<n 



6o 



< 



^E 



r<fc n 



-7 {nb*)P^\g, 

It then follows from Lemma f4. 71 that 

'Xi-x 



b 



Xj - x 



pi- 



in 



Pi/2 



iff 2 



Pt/2 



iff 2 



Pl/2 



iff 2 



dx 



dx 

Xj-x 
h 

Xj-x 
h 

Xi - x 



dx 



dx 



dx. 



Jt{x£X) E„ 



^(x)iff 2 



dx = Ow 



Op 



nbf 
nbf 

, d \Pl/2 



1 1 hi 



This proves (B.5 1 and completes the proof of the Lemma. 



□ 



Chapitre 5 

Simulation study 



Abstract : In this chapter we present our numerical results. We analyze and compare the 
performances of the Kernel density estimator / ln , based on the estimated residuals, and the ones of 
the integral Kernel estimator f 2n ■ This comparison is made in the univariate case with a quadratic 
model, as described in the next section. The chapter is organized as follows. Section 5.1 is devoted 
to the description of our simulation framework. Section 5.2 investigates the global study for the 
estimators j\ n and fi n . We compare in that section the performances of these estimators in the 
sense of the Average Integrated Squared Error (AISE). Section 5.3 deals with the pointwise study 
of our two Kernel estimators, and compare their Average Squared Error (ASE), while section 5.4 
investigates their asymptotic normality. 

5.1 Description of our simulation framework 

Let us consider the following quadratic model 

Y = 3X 2 + 2X + l + e, (5.1.1) 
where e ~ N(0, 1) and X ~ U[— 1, 1]. For our numerical study, we generate T — 100 independent 



samples (X k i,e k i) , (A fe2 , e fc2 ) , . . • , (X kn ,e kn ), k = 1, . . . , T, of size n = 200, from the model ( |5.1.l[ ). 
Define, for any integer i 6 [1, 200] and any integer k € [1, 100], 

Y ki = 3X ki + 2X ki + 1 + e ki . 

We denote by f*i k (e) and f*2 k ( e ) the simulated versions of the estimators fj n ( e ) (j = 1>2) based 
on k th sample (X k i,e k i) , (X k 2,£ k 2) , ■ • • , (X kn , e kn ). Hence the estimators fjn{t) are approximated 
by 

1 T _ 

k=l 

For the estimator we do not make a truncation and consider Xq = [—1,1] in the estimator 

/iri(e)- We also denote by /i n (e) the Kernel estimator of /(e) based on the true residuals, and by 
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/2 ra (e) the integral Kernel estimator of /(e) based on the true regression function. That is, 

a.w = sri:*(v)- 

2—1 X ' 

/2n(e) = / £n (x,e + m(x))dx, 



where m(x) — 3x 2 + 2x + l, and ip n is defined as in Chapter 4. Hence we can approximate these 
estimators by 

1 T ~ 

/in(e) = yE^( £ )' J' = 1 ' 2 ' ( 5 - L3 ) 
fc=l 

where /* - fc (e) is the Kernel version of based on the fc th generated sample. 

For the choice of the Kernels functions K(, £ = 0, 1, 2, we consider the Epanechnikov Kernel 
function 

K(ar) - K (x) = K x {x) = ? (l - x 2 ) 1 (|x| < 1) , 
and the the biquadratic or biweight Kernel function 

if 2 (x) = g(l-x 2 ) 2 l(|x|<l). 

Recall that the numerical value of fi n (e) is approximated by the Riemann sum 

v 

3 = 1 

where {xo, x\, . . . , x p } is a set of points such that — 1 = xq < x\ < . . . < x p = 1. In our setup, the 
sequence (xj) is chosen such that p = 100 and 

xj = -!+—, j = l,..., p. 

5.2 Global study 

In the nonparametric density estimation, it is known that a proper choice of the bandwidths 
is crucial for the precision of the estimator. In our simulations setup, we first find the simulated 
optimal bandwidths for the estimators fj n , j = 1,2. To that aim, we need to apply the Mean 
Itegrated Square Error (MISE) criterion which consists to minimize the quantities 

" 4 (f jn (t)-f(t)) 2 dt 

-A V 7 



MISE(/ jn ) = E 



where [—A, A] is a set that contains all the simulated residuals. In this subsection, we suppose that 
[—A, A] = [—5, 5]. For the sake of simplicity we assume that h — b\ for the estimator /271(e)- Using 
the T generated samples, we can approximate the MISE of the estimators fj n by the simulated 
Average Integrated Square Error (AISE) defined as follows : 

T 

AISE(^ n ) = AISE(/;„)(6 1 ,6o) = ^ / " /(*))' dt 
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Now for each j, we denote by (&ij,6oj) the optimal bandwidths that minimize the above AISE. 
These bandwidths are simulated from T = 100 other independent samples of size n = 200 generated 



from the model ( |5.1.1[ ), and different of the samples (Xki,£ki) > {Xk2, £fc2) , • ■ ■ , {Xkn, £fcn) 
and 



In Figures 



5.1 



5.2 



we plot the AISE of the estimators / ln and f% n when 6 an d b\ 
vary on [0.1, 1.1] in the set {hj = 0.1 + (0.01) x j, 1 < j < 100}. The first plot shows that the 
optimal bandwidths (&ii,&oi) f° r the Kernel estimator fi n would be achieved when the couple 
(bn, boi) is very close to (1, 0.2), while the second plot reveals that (612, &02) should be achieved at 
the neighborhood of (0.2,0.2). 

These graphical results about the bandwidths (6ij,&oj) are confirmed by the numerical 
results of Table 



5.1 



in which we give the optimal bandwidths for estimators fj n and / 3 „, and 
their corresponding AISE. For the (bij,boj), we observe that 601 is approximately as small as 602, 

also reveal 



5.1 



while bn and 612 are clearly different, same as b\ and 62- The results of Table 
that AISE(/i n )(&n, 6 i) < AISE(/ 2n )(6i2, &02), AISE(/ 2n ) being approximately twice as big as 
AISE(/i„). This would suggest that for a judicious choice of the bandwidths (bo,bi), the AISE 
of the estimator /i„ is smaller than the one of ji n . Consequently f\ n should be preferred to ji n 
for the estimation of p.d.f of the residuals. Moreover, Table 5.1 shows that AISE(/i„)(6n, 601) ~ 
AISE(/ ln )(M and that AISE(/ 2 „)(6 12 , b 02 ) < AISE(/ 2n )(& 2 ). 
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Table 5.1 - The optimal bandwidths (bij,boj), bj and their corresponding AISE when bo and b\ 
vary on [0.1, 1.1]. 



fin 


fin 


hn 


fin 


fell 


&01 


AISE(6n,6oi) 


bi 


AISE(fci) 


bi2 


bo2 


AISE(6i 2 ,6 02 ) 


b 2 


AISE(6 2 ) 


0.95 


0.19 


0.003141035 


1.01 


0.003083492 


0.24 


0.17 


0.006217096 


0.22 


0.006406112 



Table 5.1 shows that the optimal first-step bandwidths for the estimators f\ n and f% n would be 
very small, as recommended in Wang, Brown, Cai and Levine (2008). 

We now define for a = 0.05 and a — 0.95, the ath confidence band fjn('> a ) of the estimator 
fj n {-) as follows. For each j and any e G [—5, 5], we consider the T ordered values f*j ttyi 6 ) °f ^ ne 
f*j k (e)'s such that /* J - > (i)(e) < /*j ! (2)( e ) < ••• < f*j.(T)( e )- Hence the function fj n (a, •) is defined 



as 



f jn (e,a) = /^(^(e), e£ [-5,5]. 



5.3 



and 



5.4 



the 



Using the optimal bandwidths (bij,boj) described above, we represent in Figures 
Average Kernel estimators / Jn , the p.d.f of N(Q, 1), the 0.95th and the 0.05th confidence bands 
of the estimators fj n , j — 1,2. These plots can be useful for having a general idea about the 
confidence interval of the density /. For example, we see that for e varying in the neighborhood of 
0, we have £„(e,0.05) < /(e) < £ n (e, 0.95). 

In each of the Figures |5.3| and |5.4| the bias of the estimated density is quite important around the 
inflexion point e = 0, but the true density function remains in the good confidence interval. We 
also notice that the graphics plotted in Figure |5.4| are less smooth than the ones represented in 
This may explain the fact that AISE(^ n )(6n,6 i) < AISE(^„)(6 12 , b 02 )- 



Figure 



5.3 
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Figure 5.3 - From top to bottom, the 0.95th confidence band of /i„, the p.d.f of N(0,1), the 
Average Kernel estimator f ln and the 0.05th confidence band of fi n when b\ = bn = 0.95 and 
b =$,1 = 0.19. 



95th confidence band 
P.d.f of N(0,1) 
Average Kernel density 
5th confidence band 




Figure 5.4 - From top to bottom, the 0.95th confidence band of fin, the p.d.f of N(0,1), the 
Average Kernel estimator f 2n and the 0.05th confidence band of f 2n when bi = 612 = 0.24 and 
bo = b 02 = 0.12. 




Residuals 



5.3 Pointwise study 

In this section, we are interested in the pointwise study of the estimators fj n (e) and fj n (e) . 
First, we compare the Average Square Errors (ASE) of these estimators at the points e = —1, 0, 1. 
In a second time, a comparison of the bias and variances of these estimators is established, and 
next their asymptotic normality is investigated. 
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5.3.1 Comparison of the ASE 

Let (/j„(e), f*jk(e)) and (fjn( e ),f*jki e )) be as in the previous subsection. We compare the 
pointwise ASE of the estimators fj n (c) to the ones of the estimators /j n (e). These ASE are defined 



as 



1 T 

ASE(£ n )( e ) = -J2(f* jk (e)-f(e) 
fc=i 

ASE(£ n )(e) = ^E(/>W-/( £ ) 



fe=i 

The comparison of the ASE is done at the points e = —1,0,1, using respectively the pointwise 
optimal bandwidths 

(M4 Me)) = arg min ASE(£ n )(e), bj(e) = arg min ASE(^ n ) (e). 

(bi-b ) (bi,bo) 

As in the global study, these bandwidths are based upon T = 100 new independent samples of 
size n — 200 generated from the model ( |5.1.1 ), and different of the samples that are used for 
computing ASE(/j„)(e, b\, bo) and ASE(/j n )(e, b\). In this section, the minimizations of the ASE 
are performed for 6i and bo varying on [0.1, 3], in the set {hj = 0.1 + (0.01) X j, 1 < j < 290}. For 
j = 1,2 and e = —1,0, 1, the optimal values of the ASE(/j n )(e) and ASE(/j„)(e) are gathered in 
Tables [572] and [573] These values show that for any e = — 1,0, 1, 



ASE(/ ln )(e, 611,601) < ASE(/ 2n )(e, 612,602). 



(5.3.4) 



This fact parallels the results of the Global study in which we saw that for an optimal choice 
of the bandwidth, the AISE of the estimator /i n is smaller than the one of the estimator f% n . 
Consequently the pointwise estimator /i n (e) should also be preferred to the estimator fi n {t) for 
the nonparametric Kernel estimation of /(e). 



Table 5.2 - ASE of /i«(e) and /i n (e) using the bandwidths (6n(e), 601(e)) and 61(e). 



e = -1 


e = 


e = 1 


ASE(/i„) 


ASE(/i„) 


ASE(/i„) 


ASE(/i„) 


ASE(/i») 


ASE(/ in ) 


0.00020536762 


0.00023502221 


0.0015443395 


0.0011523854 


0.00013607338 


0.00028682107 



Table 5.3 - ASE of /2n(e) and /2«(e) based on the bandwidths (612(e), 602(e)) and 62(e). 



e = -1 


e = 


e = 1 


ASE(/ 2n ) 


ASE(/ 2 „) 


ASE(/ 2n ) 


ASE(/ 2 „) 


ASE(/an) 


ASE(/ 2n ) 


0.00086381543 


0.0009047470 


0.0026912672 


0.0025553406 


0.00092628778 


0.0014950721 
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From Tables 5.2 and 5.3 we also notice that for j = 1, 2, 

ASE(£ n )(0) « ASE(/J n )(0), ASE(£ n )(e) < ASE(£ n )(e), e = -1, 1. 

For the estimation of linear functionals of the error distribution in a semiparametric context, Miillcr, 
Schick and Wefelmeyer (2004) have shown that the estimators using the estimated residuals may 
have a smaller asymptotic variance compared to estimators that are based on the true errors. 
A reason that may explain this effect is that the estimators fj n (e) do not use the fact that the 
residuals have mean zero, contrarily to the estimators fj n (e). Note however that the improvement 
of /in(e) on /in(e) is much more clear-cut in the pointwise setup than in the global one. 

Nevertheless, we observe that for the estimator f\ n based on the estimated residuals, the 
values of the ASE are quite different at the points e = — 1 and e = 1. We then attempt to explain 
this situation by analyzing the behavior of the error terms around these points. Define, for any 
integers k € and i € [l,n], 



Ski(e) = (e ki - s ki ) 1 (Vfci - e| < &ii(e)J 



where e^i is the Kernel empirical version of based on the optimal first-step bandwidth 601(e) 
for the estimator /i ra (e). We then define the empirical mean 6(e) and the empirical variance cr|(e) 
of the <5fci(e)'s as 

T n T n 



fe=i i=i fe=i i=i 



Table 5.4 - Values of the empirical means 6(e) and the empirical variances f|(e) for e = —1, 1. 



e = -1 


e = 1 


6(e) 


°f(«) 


5(e) 


^) 


-0.3911658 


0.3331359 


0.03403744 


0.04659867 



In Table 5.4 we evaluate the quantities 6(e) and crj(e), using the bandwidths 60 = 601(e) and 
61 = 6n(e), e = —1, 1. We observe that the variables 6ki(— 1) have a lower empirical bias and a 
higher empirical variance than the data (5/^(1). Hence around the point e = —1, the error percentage 
for the estimation of the true residuals Ski by the nonparametric residuals e~ki is more important 
than around the point e = 1. This may explain the difference of the ASE at the points e = —1,1 



for the estimator f\ n , as seen in Table 



5.2 



5.3.2 Comparison of the bias and variances 

In this subsection, we suppose that the estimators f*j k (e) and f*j k (e) (j = 1,2) defined 
in the previous subsection are respectively based upon the optimal bandwidths (b\j(e), bgj(e)) and 
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bj(e). For each j, let Bj n (e) and Bj n (e) be respectively the empirical bias of the estimated densities 
/j ra (e) and /jn(e). These estimated quantities are defined as 

T T 

B 3 n(e) = i £ (/*. fc ( e ) - /(e)) , B^e) = ^ (/^(e) - /(e)) • 
fe=i fe=i 

The simulated values of the bias Bj n (e) and Bj n (e) at the points e = —1,0, 1 are represented in 
Table EH and EU 



Table 5.5 - Optimal values of the bias B ln (e) and B ln (e). 



e = -1 


e = 


e = 1 






S in (e) 


Si n (e) 


Bi n (e) 


Bi„(e) 


-0.005290204 


-0.008126554 


-0.02450615 


-0.01726208 


-0.005446647 


-0.008392434 



Table 5.6 - Optimal values of the bias i?2n(e) and B2 n (e). 



e = -1 


e = 


e = 1 


B 2 n(e) 


B 2 n(e) 


B 2 n(e) 


S 2n (e) 


B 2n (e) 


B 2n (e) 


-0.01615247 


-0.01661985 


-0.02447482 


-0.02612883 


-0.01803243 


-0.01262712 



Table 



5.5 



reveals that |B ln (e)| < |B 2n (e)| for e = -1 and e = 1, and that |B ln (0)| « |B 2n (0)|. This 
indicates that at the points e = — 1 and e = 1, the estimator /i ra (e) would be less biased than the 
estimator /2n(e). 

Moreover for e = —I and e — 1, the estimator /i n (e) is much less biased than the estimator /i„(e). 
Consequently, there is a positive influence of the bandwidth &oi( e ) on the bias of /i n (e). But this 
situation contrasts with the one observed at e = 0, for which /i n (e) is more biased than /i n (e). 

For /2n( e ) and / 2 n(e), we note that the bias of these estimators are very close at the points 
e = — 1, 0, 1. This means that the estimation of the regression function has a negligible impact on 
the bias of the estimator /2n(e)- 

Now, let Vjn(e) and Vj n (e) be the estimated variances of fj n (e) and fj n (e) defined as 
1 T 2 i T 



k=l fe=l 



where 



1 T — 1 T ~ 

fc=l fc=i 



The simulated values of these empirical parameters are gathered in Tables 5.7 and 5.8 
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Table 5.7 - Optimal values of variances Vi n (e) and Vi n (e). 



e = -1 


e = 


6=1 


Vi„(e) 


Vm(e) 


Vi n (e) 


Vi„(e) 


Vi»(e) 


Vi„(e) 


0.0001773813 


0.0001689813 


0.0009437874 


0.0008544052 


0.0001064074 


0.000216388 



Table 5.8 - Optimal values of the variances V2„(e) and V2n(e)- 



e = -1 


e = 


e = 1 




V 2 n(e) 


V 2 „(e) 




^2n(e) 




0.0006029132 


0.0006285278 


0.00209225 


0.001872625 


0.0006011193 


0.001335628 



From Table 5.7 we notice that Vi n (e) < V2 n (e) for e = — 1, 0, 1. Consequently the estimator fi n (e) 
should be preferred to /271(c), since the latter estimator is less efficient than the first one. 

Moreover, we observe that Vi n (e) is much less than V\ n (c) at e = 1, and slightly equal to Vi„(e) 
when e = — 1 and e = 0. This means that the estimation of the residuals may have a positive 
influence on the final estimator f\ n {e). 

For the variances V2 n (e) and V2 n (e), it is seen that the first variance is much less than the latter one 
at e = 1, and very close to V^ n (e) when e = — 1 and e = 0. Hence the estimation of the regression 
function m may have a positive effect on the estimator /2n(e). 

In conclusion, we note that at the points e = —1,0,1, the estimator fi n (e) domi- 
nates the estimator /2n(e) for the ASE, the bias and the variance. As in the Global study, 
this suggests that the first estimator should be preferred to the second one when we are interested 
in their Pointwise study. 
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5.3.3 Asymptotic normality 



We examine here the asymptotic normality of the estimators /j„(e), for j = 1,2 and 
-1,0, 1. To that aim, we introduce the standardized variables 



Z jn (e) 



\Jnhj(e) (f jn (e) - /(e) 



f(e)fKl(v)dv 



y/nb^e) (f* jk (e)-f(e) 



f(e)jKUv)dv 



k = 1, 



T 



where the /* :)fc (e)'s are defined as in previous subsection, for the evaluation of the bias and variances. 
The empirical mean fi.j n (e) and the empirical variance aj n (e) of the Zj n (e)'s are such that 



k=l k=l 



Are the data Zj n (e) normal distributed? 

For each j and e, we wish to test the hypothesis 



H 0j ( e ):Z Jn (e)~JV(/i i (e),^(c)) 



versus 



H lj (e):%(e)/iV( Atj (e) ) a|(e)) 



where the parameters Hj(e) and cr|(e) are unknown and have to be estimated. The normality 
of the data Zj n (e) can be tested by an analytical method such as the Lilliefors method for the 
Kolmogorov-Smirnov test. Let us perform this Lilliefors test that the data Zj n (e) come from the 
normal distribution. For this, we denote by KSj(e) and Pj(e) respectively as the Kolmogorov- 
Smirnov statistic and the p-value of the above test. With the Lilliefors's method, the evaluation 
of the p- values Pj(e) and the statistics KSj(e) accounts for the estimations of fij(e) and cr|(e). For 
the characteristics and the properties of the KS or Lilliefors's test, see Massey (1951), Shorack and 
Wellner (1986), Dallal and Wilkinson (1986), Lehmann and Romano (1998), and Thode (2002). In 



Table 5.9 we have gathered the numerical values of the KSj(e)'s and the pj(e)'s. 

Table 5.9 - Values of the statistics KSj(e) and the p- values Pj(e) of the Zj(e)'s. 



e = -1 


e = 


e = 1 


KSi(e) 


pi 00 


KS 2 (e) 


Pa(e) 


KSi(e) 


Me) 


KS 2 (e) 




KSi(e) 


Me) 


KS 2 (e) 


Me) 


0.0506 


0.9598 


0.0427 


0.9933 


0.0713 


0.6891 


0.0518 


0.9516 


0.0746 


0.6347 


0.0882 


0.4176 



The results of Table |5.9| show that the hypothesis on the normality of the data is accepted, since 
Pj(e) > 0.05 = a (a default value of the level of significance). Hence according to the Lilliefors 
method, we can accept the fact that the data Zj n (e) come from a normal distribution. 

Beside the Lilliefors test, there exists a graphical method for investigating the normality 
of the data. This method is the normal Q-Q plots of the variables Zj n (e). The Q-Q plot provides 
a graphical way to determine the level of normality If the data fall exactly along a reference line 



5.3 Pointwise study 



108 



(called the Henry's line), then the hypothesis on their normality can be receivable. If the empirical 



data deviate widely from this line, the data are non-normal. In Figures 5.5 5.6 and 5.7 we represent 
the normal Q-Q plots of the data Z ln (e) and Z 2n (e) for e = —1,0, 1. 



FIGURE 5.5 - From left to right : normal Q-Q plot of the data Zx n (—1) and Z 2 „(— 1). 

Normal Q-Q Plot Normal Q-Q Plot 




-1 

Theoretical Quantiles 



Theoretical Quantiles 




Theoretical Quantiles 



Theoretical Quantiles 
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From the above figures, the hypothesis Hoj(e) can be receivable. However, we can have some doubts 
about the symmetry of the Zj- n (e)'s, since we notice that they deviate slightly from the Henry's 
line at the tails of the distribution. This augurs that the distribution of the variables Zj n (e) should 
not be symmetric. 

Do the data Zj n (e) come from the standard normal iV(0,l) ? 
Since the data Zj n (e) are standardized variables, we can wonder if they come from the normal dis- 
tribution N(0, 1). To give some elements of answer to this question, we first compute the empirical 
bias and variances of the Zj- ra (e)'s. These quantities are grouped in Table 



5.10 



Table 5.10 - Values of the empirical means and variances of the Zj n (e)'s. 



e = -1 


e = 


e = 1 


Mm( e ) 


<?in(e) 


M2n( £ ) 




Pin(e) 


Sin(e) 


M 2 n(e) 




Min(e) 


<?in(e) 


M 2 n(e) 




-0.1459 


0.4900 


-0.3105 


0.4098 


-0.5798 


0.6389 


-0.3124 


0.4380 


-0.2491 


0.4465 


-0.3097 


0.4323 



This table shows that the empirical paremeters £ij n (e) and <7 ln (e) are clearly different to and 1, 
which correspond respectively to the theoretical mean and variance of the normal N(0, 1). 

We now evaluate the empirical quantilcs of the variables Zj n (e). For each j, we consider the 
ordered values Z* j,(k)( e ) OI the Z*jfc(e)'s such that Z*j i ^(e) < Z*j^ 2 )(e) < ... < Z* j^ T ){e). Hence 
for any a £ [0,1], the a th empirical quantiles of the Zj„(e)'s are defined as Zj n (e,a) = Z*jt a T)(e)- 
In Table [5~TT| we give the simulated values of these quantiles when a = 0.05 and a = 0.95. 
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Table 5.11 - Values of the quantiles aj(e) — Z*jj 05xT \(e) an d c j( e ) — Z* 3> (o.95xT)( e )- 



e = -1 


e = 


e = 1 


oi(e) 


d(e) 


a 2 (e) 


c 2 (e) 


oi(e) 




a 2 (e) 




ai(e) 


ci(e) 


02(e) 


c 2 (e) 


-0.9719 


0.6654 


-0.9790 


0.4006 


-1.6874 


0.4734 


-1.0893 


0.4738 


-1.1377 


0.5389 


-1.1109 


0.5071 



From Table 5.11 we note first that the quantities Zj n (e, 0.05) and Zj n (e, 0.95) are globally clearly 
different to —1.64 and 1.64, the corresponding theoretical quantiles of the normal distribution 
N(0, 1). Next, the values of these quantiles show that the variables Zj n (e) are globally asymmetric. 
This suggests that the Zj n (e)'s are not distributed according to the normal variable N(Q, 1). 

We now estimate the confidence intervals for the theoretical quantiles of the Zj n (e)'s. This 
estimation is done under Hoj(e), and based on the following result. For a g]0, 1[ and T — > 00, 

Z*j,( aT ){e) ~ QjA e ) d > N(Q ^ 
y/a(a-l)/(TP(Q jta (z))) 

where Qj, a (e) is the theoretical a th quantile of the variable Zj n (e), and /(•) the p.d.f of the normal 
variable N(0, 1). This result can be found, for example, in Tassi (1985). A consequence of a such 
result is that an asymptotic confidence interval for the Qj )Q: (e)'s, with a level of confidence 1 — a, 
is given by 



i. (or) 



(e) 



T/(Z* i>(ar) (c) 



3, (off) 



Tf [Z* jt ( aT) (e) 



where q a /2 denotes the (1 — a/2) quantile of the standard normal distribution. In Tables 5.12 
and 



5.13 



we give the estimations of the Ij, a { e ) w hen a = 0.05 and a — 0.95, with T — 100 and 
n = 200. As seen above in Table |5.11[ the results of Tables |5.12| and |5.13| also reveal that the 
quantiles Q^o. 05(e) and Qj,o.95 (e) should be respectively quite different to —1.64 and 1.64. 



Table 5.12 - Confidence intervals of the theoretical quantiles Qj, a { e ) when a = 0.05. 



e = -1 


e= 


e = 1 






Ji,«(e) 




/l,a(c) 


£,a(e) 


[-1.143,-0.800] 


[-1.152,-0.806] 


[-2.132,-1.243] 


[-1.283,-0.896] 


[-1.342, -0.933] 


[-1.309,-0.913] 



Table 5.13 - Confidence intervals of the theoretical quantiles Qj >a ( e ) when a = 0.95. 



e = -1 


e = 


e = 1 


^i,«(e) 




A,a(e) 


*»,a(c) 






[0.5318,0.7990] 


[0.3257,0.5172] 


[0.3536,0.5934] 


[0.3549,0.5941] 


[0.4159,0.6635] 


[0.3852,0.6297] 
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The above results indicates that the data Zj n (e) are not distributed according to the 
standard normal N(0, 1). We then attempt to verify if this situation is due to the influence of 
estimated first-step bandwidths boj(e) on the variables Zj n (e). For this, we test the hypothesis 
Zjn(e) ~ N (0, 1) versus Z jn (e) / N (0, 1), where 



Z jn (e) 



\Jnb^) (/jn(e) - f(e) 



and /jn(e) being at in the beginning of the subsection, in the comparison of the bias and variances. 
To perform the test, we consider T independent replications 



yJf(e)jK*(v)dv 



T 



of the variables Zj n (e). We denote by Dj n (e) the Kolmogorov-Smirnov statistic associated with 
the test. For our goodness of fit test, the null hypothesis is rejected with a level of significance a 
if VTDj n (e) > K a , where K a satisfies 



TD jn (e) < K a 



D jn (e) < 



= l-a. 



The tables of critical values of the goodness of fit test to the standard normal variable can be 
found in the statistic literature. See, for example, Smirnov (1948), Miller (1956), Gibbons and 
Chakraborti (2003). Some of the results for the asymptotic approximations of the critical value K a 
based on the ration C a = K a /yT are : 



p(5,-(e)>C a ) 


0.20 


0.15 


0.10 


0.05 


0.01 


K a 


1.07 


1.14 


1.22 


1.36 


1.63 



In Table [5_14j we give the values of the statistics VTD jn (e) for T = 100, n = 200, j = 1,2 
and e = —1,0, 1. The results obtained here show that for the level a = 0.05, the null hypothesis 
Zjn{t) ~ -W(0j 1) is rejected, since \fTD jn {e) > K a = 1.36, for all j and e. 

Table 5.14 - Values of the statistics \fTD jn {e) for T = 100, j = 1, 2 and e = -1, 0, 1. 



e= -1 


e = 


e = 1 


VTD ln {e) 


VTD 2n (e) 


VTD ln (e) 


VTD 2n (e) 


VTD ln (e) 


VTD 2n (e) 


3.159609 


3.354464 


2.780215 


2.676096 


2.465744 


1.890398 



We now attempt to explain the non-validity of the hypothesis Zj n (e) ~ N(0, 1) by compu- 

_ _2 — 

ting the empirical mean jj,j n (e) and the empirical variance <r.-„(e) of the data Zj n {e). 
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Table 5.15 - Values of the empirical means and variances of the Zj„(e)'s. 



e = -1 


e = 


e = 1 


h«( f ) 




M 2 n( e ) 


f2«(e) 


Ml„(e) 


o-in(e) 




0"2n(e) 


Mln(<0 


CTln(e) 


M2n(<0 


0"2n(e) 


-0.4402 


0.5322 


-0.4044 


0.4163 


-0.4300 


0.5743 


-0.2880 


0.4685 


-0.3274 


0.6124 


-0.0773 


0.5473 



Table 



5.15 



_ _2 

shows that the estimated quantities /U 7n (e) and <x„(e) are clearly different to and 1. 



This should explain the rejection of the hypothesis Zj n (e) ~ 7V(0, 1), as seen above. Hence the 
results of our simulation study reveal that with the optimal step bandwitdhs (poj(e), bij(e)) and 
bj(e), the variables Zj n (e) and Zj n {e) are not distributed according to the standard distribution 
N(0, 1). However, the impact of the estimated optimal first-step bandwidths boj(e) on the asymp- 
totic normality of the variables Zj n (e) may not be so important as augured by the results obtained 
with the data Zj n (e). 
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5.4 Conclusion 

The aim of this subsection was to analyze and compare the performances of the Kernel 
density estimator /i„ , based on the estimated residuals, and the ones of the integral Kernel estima- 
tor / 2n . Several aspects have been noticed in our simulation study. First, in the global framework, 
our numerical results show that the estimator f\ n should be preferred to the estimator f 2n - The 
reason is that the optimal AISE of the latter estimator is much more higher than the one of the first 
estimator. For the evaluation of the bandwidths (b\j,boj) that minimize the AISE of the estimators 
fjn (j — E2), our numerical results indicates that boi is much smaller than bn, and that 602 is 
approximately as small as 612. 

Next, for the pointwise study which is made at the points e = —1,0, 1, we observe that 
/i„(e) dominates /271(e) for e = — 1 and e = 1 as well as for the ASE, the bias and the variance. 
Further, the ASE of the estimators fj n {c) are nearly the same as the ones of the estimators fj n (t) 
for e = 0, and lower than the ASE of /jn(e) for e = — 1 and e = 1. In a scmiparametric context, 
Miiller, Schick and Wefelmeyer (2004) have shown that for the estimation of linear functionals of the 
error distribution, the estimators that use the estimated residuals may have a smaller asymptotic 
variance compared to the estimators based upon the true errors. Some of our simulation results 
suggest that a similar conclusion may hold when estimating the p.d.f. of regression residuals. In 
fact, for e = 1, the variances of the estimators /j n (e) are higher than the ones of the estimators 
/j„(e). This shows that the estimation of the first-step bandwidth 6 may have a positive influence 
when estimating /(e). 

The study of the asymptotic normality of the standardized variables Zj n (e) and Zj n (e), 
based on the density estimators /jn(e) and /j„(e), reveals that the data Zj n (e) and Zj n (e) are 
normal, but are not distributed according to the standard normal variable. This means that the 
normal approximation of these variables by the normal -/V(0, 1) is not satisfying for a small size of 
the samples (n = 200 in our framework). Therefore, it will be interesting, in a future works, to use 
the boostrap method for obtaining an alternative approximation of the considered variables. This 
will be one of the main aspects of the perspectives of our future researches, as illustrated at the 
end of this thesis. 



Chapitre 6 

Appendix 



Abstract : This chapter contains some results which have an interest themselves and 
are used in Chapter 3 and Chapter 4. We begin with the Lyapounov Central Limit Theorem for 
triangular arrays which is used, for example, in the proof of Proposition |3 . 1 1 and Theorem |3.4| We 
also recall Theorem 1 and Theorem 2 in Einmahl and Mason (2005). These results are need in 
the validation of Lemma [3~T| We conclude by Theorem 2 in Whitlle (1960) and the Marcinkievicz- 
Zygmund inequality (see e.g Chow and Teicher 2003, p. 386) which are very useful for proving 
Lemma [3.101 and Lemma \A. 121 

6.1 Lyapounov's Central Limit Theorem 

For each integer n > 1, let {X\ n , Xi w ■ ■ ■ ,X nn } be a collection of random variables such 
that Xi n , Xzn, ■ ■ ■ , X nn are independent. Then {Xi n , Xi ni . . . , X nn } is called a triangular array of 
independent variables. 

Theorem 6.1. (Lyapounov's Theorem) 

For all integer n > 1, assume that the variables Xi n , 1 < i < n, are independent with E [Xi n ] = 
for all i. Let a n — Var {Xin)- If there exists 5 > such that 

n 

lim <( 2+ ">]TE[LX m | 2+ "] =0, 

i=l 

then 

Xin + Xin + . . . + X nn d ^ jy/g ^\ 

vru^(x=) ' 

when n — > oo. 



This result can be found, for example, in Billingsley (1968, Theorem 7.3). 



6.2 Uniform in bandwidth consistency of kernel-type function estimators 



115 



6.2 Uniform in bandwidth consistency of kernel-type func- 
tion estimators 

In this section, we give two results concerning the uniform in bandwidth consistency of 
kernel-type estimators, such that the density estimator and the regression function estimator. The 
results proposed here are established in Einmahl and Mason (2005). They are one of the keys of 
our main results in Chapter 3 and Chapter 4. 

The first result we give concerns the Kernel density estimator. Let X\, X 2 ■ . ■ , X n be i.i.d 
R d , d > 1, valued random variables and assume that the common distribution function of the 
variables has a Lebesgue density function, which we denote by /. The Kernel density estimator of 
/ based upon the sample Xi, X 2 , . . ■ , X n , a Kernel function K and a bandwidth < h = h(n) < 1 
is defined as 

1 — Xi\ j 

x e R d . 



1 ™ / _ 

?■■»<*> = 5* 2> U 



Xi 
Id 



For any function G defined and bounded on R d , we denote by ||G||oo the uniform norm of G such 
that 

\\G\\ oa = sup |G(x)|. 

xGR d 

The following theorem is proposed Einmahl and Mason (2005, p. 1382). 

Theorem 6.2. Assume that the Kernel function K is symmetric, continuous overR d with support 
contained in [— l/2,l/2] d and jK(x)dx = 1. If the density function f is continuous and bounded 
on its support, then we have for any C > 0, with probability 1, 

\/ln(l/h) Vln (lnn)\ 



limsup sup \\fn,h-Kfn.h\\co = 

n^co C(\n(n)/n)<h<l 



nh 



Remark : Choosing a sequence h — h(n) satisfying (nh/lnn) — > oo and ln(l//i)/ In (Inn) — > oo, 
one obtains, with probabilty 1, 

||7n,fc - E/n^lIco = O ( v /(ln(l//i))/(n/ l )) , 

which is Theorem 1 of Gine and Guillou (2005). 

The other kinds of kernel-type estimators treated by Einmahl and Mason is the regres- 
sion Kernel estimators. For the illustration, consider i.i.d (d + l)-dimensional random vectors 
{X, Y), (X\, Yi), (X%, Y 2 ), . . . , (X n , Y n ), where the K-variables are one-dimensional. We assume that 
X has a marginal Lebesgue density function / and that the regression function 

ro(jc) = E [Y | X = x] , xeR d . 
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exists. Let rh nt h{x) be the Nadaraya- Watson estimator of m(x) with bandwidth < h < 1, that is, 

,_ £r=i**((*-*)/fe i/d ) 
M e?=i x (o - x i)/ hi/d ) ' 

With the above setup, we have the following uniform in bandwidth result. Let K and /i be as in 
the previous section, and set 

'x-X" 



f(x,h) = h~ L E 



, x-X 



f(x,h) = h^E 



K 



h i/d 



For any subset / of R d , let I e denote its closed e-neighborhood with respect to the maximum norm 
| • | + on R d , that is, |x|+ = maxi<i<„ \xi\, x € R d . Set further for any function ip : R d — > R, 
HV'llz = sup 2 , eR |i/'(a;)|. 

Theorem 6.3. (Einmahl and Mason 2005, p. 1384) 

Let I be a compact subset ofR d ofR d and assume that the Kernel function K satisfies the condition 
of Theorem \ 6.S\ Suppose further that there exists an e > so that f is continuous and strictly 
positive on J :— I c . If we assume that for some p > 2, 

supE(|F| p \ X = z) := a < oo, 

zE.J 

we have for any C > and b n \ with 7 = 7(2?) = 1 — 2/p, 

\ 1 — / . 1 1 / J\n{l/h) Vln(lnn)\ 
hmsup sup \\m n ,h-r{-,h)/ f{-,h)\\i = O \ , 

n->oo C(ln(n)/n)^<h<h„ V ntl J 

almost surely. 

6.3 Bounds for the moments of linear forms in independent 
variables 

The aim of this section is to propose absolute moments of linear forms in independent 
statistical variables. The first result we give here is established by Whitlle (1960, Theorem 2). 
Consider the linear form L = Y^j=i a jCji where the Q's are assumed to be independent mean- zero 
random variables, but not necessarily to be distributed identically. In what follows, we shall write 

op/2 f+oa 

C{p) = — / \x\ p e~ x dx, 
V 71 " J-00 

and 7j(p) = (E |£?'| P ) 1/ ' P j P > 0, provided that these quantities exist. 

Theorem 6.4. (Whittle, 1960) 
Then the following inequality is valid 

p/2 

E{\m<2vc{p)\Y,i 2 M 

provided that p > 2 and the right-hand member exists. Moreover, if all the Q have symmetric 
distributions, then the right-hand member may be divided by 2 P . 
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The second result we give is the Marcinkiewicz-Zygmund Inequality (See Chow and Teicher 
2003, p.386). For any p > 1, let || • || p denotes the £P-norm, that is, \\X\\ p = (E\X\p) 1/p for any 
random variable X such that E < oo. 

Theorem 6.5. Marcinkiewicz-Zygmund Inequality 

If {X n , n > 1} are independent random variables with E[X n ] = for all n, then for every p > 1, 
there exist positive constant A p and B p depending only upon p for which 



1/2 



< 



3=1 



< B n 




The proof of this Theorem can be found, for example, in Chow and Teicher (2003, p. 386). 



Perspectives 



Abstract 

In this section, we sketch some perspectives for possible future researches. First, we have seen in 
our simulation study that the estimator of /(e) introduced in Chapter 3 would be preferred to the 
one proposed in Chapter 4. However, it would be very interesting to compare the theoretical bias 
of the two estimators for determining the estimator that have to be used in a given context. 

Our numerical results also reveal a curious situation : the estimator /j n (e) (j — 1,2) is 
sometimes more efficient than the estimator /jn(e) when we are interested in their pointwise study. 
This situation comes from the evaluation of the second order of fj n (c), that is /jn(e) — fjn(t)i 
which possibly allows to improve the performances of /jn(e). This curious siuation makes one to 
think that the term fj n (c) — fjn(t) is worth thinking about and deserved further consideration. We 
shall also attempt to obtain the uniform weak consistency for the difference /j n (e) — E n fj n (e). 

All the results proposed in this thesis are obtained in the case of a homoscedastic model. 
Then another axis for future researches will concern the extension of our results in a heteroscedastic 
framework, when the variance function depends upon the explanatory variable. 

Resume 

Dans cette partie, nous donnons une esquisse des perspectives de recherche pour nos futurs travaux. 
D'abord, les resultats de nos simulations numeriques montrent que l'estimateur de /(e) introduit 
au Chapitre 3 devrait etre prefere a celui defini au Chapitre 4. Cependant, il serait interessant de 
comparer de facon theorique les biais des deux estimateurs. Ce sera l'un des problemes sur lesqucls 
nous nous pencherons dans nos recherches ulterieures. 

Les resultats de nos simulations montrent egalement un point assez curieux : l'estimateur 
fjn{ € ) (j = 1)2) est parfois plus efficace que l'estimateur fj n (e) lorsqu'on les etudie ponctuelle- 
mcnt. Cette situation est due au second ordre de /jn(e), c'est a dire /jn(e) — /,„(e), qui permet 
eventuellement d'ameliorer les performances de /jn(e). Ce deuxieme ordre meriterait d'etre etudie 
de facon plus poussee. Nous tenterons aussi d'obtenir des resultats de consistance uniforme pour 
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la difference entre /jn(e) et E n fj n (e). 

Tous lcs rcsultats proposes dans cette these ont ete obtenus dans un modele de regression 
homoscedastique. Un autre axe de recherche pour nos futurs travaux sera de voir si des resultats 
comparables peuvent etre obtenus dans le cas du modele heteroscedastique, ou l'erreur du modele 
depend de la variable explicative. 
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